EU Pollinator Hub

, ,

Dataset Report
Unique identifier: BGDVR196.0.0
Title: B-GOOD Virus Sequences
Long title: Dataset from the B-GOOD project, containing GenBank accession numbers for virus sequences.
Status: Quality Validated
Current Version: v. 1.0
Published: 2025-03-17
Reviewed by: Rubinigg Michael as Data scientist
Citation proposal:
B-GOOD Bee Health Data Portal 2025 Report of dataset B-GOOD Virus Sequences, v. 1.0 [BGDVR196.0.0]. EU Pollinator Hub. [2026-02-24] app.pollinatorhub.eu
Compliance with FAIR* principles
Findable
Accessible
Interoperable
Reusable
See https://www.go-fair.org/fair-principles for more information about FAIR principles
Data Quality
Good

This document is intended for use by collaborators of the EU Pollinator Hub and may be passed on with the express permission of the leader of the consortium and for the purpose determined by the leader of the consortium.

Document history

Release

Version v. 1.0 released on 2025-03-17. Reviewed by Rubinigg Michael.

Revision

Table 1. List of revisions made to the document. Identifier of revision (No); date of revision (Date); description of revision (Description); reason for revision (Reason).
No Date Description Reason
1 2025-03-17 00:03:00 Initial release. n/a

Abbreviations

ABPV
Acute Bee Paralysis Virus
CBPV
Chronic Bee Paralysis Virus
CSV
Comma-Separated Values
DWV
Deformed Wing Virus
EU
European Union
EUPH
EU Pollinator Hub
INRAE
Institut National de la Recherche Agronomique (National Research Institute for Agriculture, Food and Environment)

Executive summary

Data overview:

The data was published by Bonjour-Dalmon A (INRAE) on the B-GOOD Bee Health Data Portal as part of the B-GOOD project (grant agreement 817622), funded under the EU Horizon 2020 Research and Innovation Programme. It contains the GenBank accession numbers for virus sequences used to quantify various viruses (ABPV, CBPV DWV) in honey bee samples.

Data value:

The objectives of the B-GOOD project were: (1) Facilitate decision making for beekeepers and other stakeholders by establishing ready-to-use tools for operationalising the HSI; (2) Test, standardise and validate methods for measuring and reporting selected indicators affecting bee health; (3) Explore the various socio-economic and ecological factors beyond bee health; (4) Foster an EU community to collect and share knowledge related to honey bees and their environment; (5) Engender a lasting learning and innovation system (LIS); (6) Minimise the impact of biotic and abiotic stressors.

Data description:

n/a

Data application:

Currently, the data integrated from the B-GOOD Bee Health Data Portal contains major issues and does not fully comply with the FAIR Guiding Principles for scientific data management and stewardship applied on the EU Pollinator Hub. More descriptive information about the context, quality and condition, or characteristics of the data (e.g. protocols, measurement devices used, units of the captured data, or any other details about the study) must be provided. More metadata in the form of accurate explanations of all variable names must be provided.

Unresolved issues:

n/a

Introduction

n/a

Material and methods

Data acquisition

All raw data files were downloaded from the B-GOOD Bee Health Data Portal on 2024-09-26 18:16:30.

List of raw data obtained from the data provider.

  1. File Accession_Numbers-GENBANK.xlsx, accessed on 2024-09-26 18:16:30, provided by B-GOOD Bee Health Data Portal

Metadata was obtained from the dataset's web page.

Table 2. List of raw data and metadata files included in the dataset. Identifier of table row (No); name of the file (File); the type of the file (Type); file contains data (D); file contains metadata (M); date of upload of the file to the EU Pollinator Hub (Arrival); number of data points contained within the file (if applicable); uploaded file size.
No File Type D M Arrival Data points File size
1 ABPV_PREP_MR_241102.csv CSV - Comma seperated values Yes No 2024-11-02 11:11:50 121 1.37 KiB
2 CBPV_PREP_MR_241102.csv CSV - Comma seperated values Yes No 2024-11-02 11:11:15 352 3.66 KiB
3 DWV_PREP_MR_241102.csv CSV - Comma seperated values Yes No 2024-11-02 11:11:38 1,353 13.25 KiB

Data preparation

The file in the zip-archives was extracted using File Explorer (Microsoft Corporation, version 22H2).

The file Accession_Numbers-GENBANK.xlsx was opened with MS Excel (Microsoft Corporation, version 2409). The worksheets were exported to data files in CSV format (UTF-8 encoding) and imported into Notepad++ (version 8.7) where missing values were substituted by {NULL} using regular expressions. Dates were parsed to the required YYYY-MM-DD format using the python script ParseDates.py.

Data was then exported to the respective preparatory files and uploaded to the EU Pollinator Hub according to SOP-017 (Dataset integration.

Data validation

No data validation was performed.

Data analysis

No data analysis was performed.

Data description

Dataset

Table 3. Summary of tables belonging to the dataset. Table row identifier (No); name of the table (Table); description of the table (Description).
No Table Description
1 ABPV GeneBank acession numbers for Acute Bee Paralysis Virus (ABPV)
2 CBPV GeneBank acession numbers for Chronic Bee Paralysis Virus (CBPV)
3 DWV GeneBank acession numbers for Deformed Wing Virus (DWV) Two regions of the genome were targeted to assess DWV diversity and…
Table 4. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
interactions.single.uid BGDVR196.0.0
Title B-GOOD Virus Sequences
Long title Dataset from the B-GOOD project, containing GenBank accession numbers for virus sequences.
Target IRI https://app.pollinatorhub.eu/dataset-discovery/BGDVR196.0.0
interactions.single.section-details.licence CC0 1.0 Universal
DOI n/a
Created 2024-11-02
Published 2025-03-17
Contact n/a
Keywords ABPV, BQCV, CBPV, DWV, SBV
Data collection years n/a
Regions, the data was collected in Belgique/België, Deutschland, France, Nederland, Portugal, România, Schweiz/Suisse/Svizzera, United Kingdom
Abstract

Dataset containing the GenBank accession for virus sequences used to quantify various viruses in honey bees (ABPV, CBPV, DWV) in honey bee samples. It was published by Bonjour-Dalmon A (INRAE) on the B-GOOD Bee Health Data Portal as part of the B-GOOD project (grant agreement 817622), funded under the EU Horizon 2020 Research and Innovation Programme.

Table 5. Standardised metadata of the data provider B-GOOD Bee Health Data Portal. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name B-GOOD Bee Health Data Portal
Url
Acronym B-GOOD
IRI https://app.pollinatorhub.eu/data-providers/b-good-bee-health-data-portal
Address https://b-good-project.eu
Country Belgium
Contact b-good-project.eu
Description

Project funded by the EU Horizon 2020 Research and Innovation Programme under grant agreement No 817622. Project website: https://b-good-project.eu

Tables

ABPV

Table 6. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier BGDVR196.ABPVA480.0
Name ABPV
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/BGDVR196.ABPVA480.0
Table Type File
Licence CC0 1.0 Universal
Description

GeneBank acession numbers for Acute Bee Paralysis Virus (ABPV)

GeneBank acession numbers for Acute Bee Paralysis Virus (ABPV)

Metadata

Table 7. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
Genbank submission ID

Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).

Integer number Integer [0.0.NTGER313]

n/a

GenBank accession numbers String pms:genBankAccession [0.0.GNBNK515]

n/a

Submission date

Date of submission to GenBank.

Date iso-8601:calendarDate [0.0.DATEA317]

n/a

Publication date

Not specified by the data provider. Presumably the date at which the GenBank submission was published.

Date iso-8601:calendarDate [0.0.DATEA317]

n/a

Sequence_ID

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Sequenced region

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Size

Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.

Integer number Integer [0.0.NTGER313]

bp

Isolate

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Host

Not specified by the data provider. Presumably the host species from which which the virus has been isolated.

String dwc:scientificName [0.0.SCNTF503]

n/a

Country

Not specified by the data provider. Presumably the name of the region in which the virus has been isolated.

String Text [0.0.TEXTA315]

n/a

Collection_date

Not specified by the data provider. Presumably the approximate date at which the sample has been collected.

String Text [0.0.TEXTA315]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 8. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Genbank submission ID 7 - 7 2,743,798.7 2,743,639 2,743,639 2,743,890 2,743,890 2,743,890 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 18.2% )
GenBank accession numbers 8 - 8 n/a OR540712 n/a n/a n/a OR540722 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 11 ( 100.0% )
Submission date 10 - 10 n/a 2023-12-09 n/a n/a n/a 2023-12-09 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 9.1% )
Publication date 10 - 10 n/a 2024-03-18 n/a n/a n/a 2024-03-18 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 9.1% )
Sequence_ID 20 - 24 n/a 2247_UK_Aut2… n/a n/a n/a BG125-20_RO_… 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 11 ( 100.0% )
Sequenced region 5 - 5 n/a 3'ORF n/a n/a n/a 5'ORF 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 18.2% )
Size 3 - 3 796.9 777 789 796 808 815 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 10 ( 90.9% )
Isolate 4 - 8 n/a 2247 n/a n/a n/a BG125-20 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 7 ( 63.6% )
Host 14 - 14 n/a Apis mellife… n/a n/a n/a Apis mellife… 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 9.1% )
Country 6 - 14 n/a France n/a n/a n/a United Kingd… 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 27.3% )
Collection_date 10 - 10 n/a 2020-05-22 n/a n/a n/a 2021-02-03 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 5 ( 45.5% )

Quality measures

Table 9. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Genbank submission ID
100.00%
18.18%
2743890 2743639
GenBank accession numbers
100.00%
100.00%
OR540712 OR540712
Submission date
100.00%
9.09%
2023-12-09 2023-12-09
Publication date
100.00%
9.09%
2024-03-18 2024-03-18
Sequence_ID
100.00%
100.00%
2247_UK_Aut2020_3ORF 2247_UK_Aut2020_3ORF
Sequenced region
100.00%
18.18%
5'ORF 3'ORF
Size
100.00%
90.91%
789 798
Isolate
100.00%
63.64%
2247 2253
Host
100.00%
9.09%
Apis mellifera Apis mellifera
Country
100.00%
27.27%
United Kingdom Romania
Collection_date
100.00%
45.45%
2020-05-22 2020-12-09

Changes made to preparatory file

None

Changes made to data

None

Unresolved issues

  1. For column Genbank submission ID it may be guessed, but it is not explicitly stated what it describes. In particular, the pathway through which the sequence was submitted should be explicitly specified. The data provider is requested to make this information available.
  2. For column Publication date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  3. For column Sequence_ID it is unclear what it describes. The data provider is requested to make this information available.
  4. For column Sequence region it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  5. For column Size it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  6. For column Isolate it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  7. For column Host it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  8. For column Country it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  9. For column Colection_date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.

CBPV

Table 10. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier BGDVR196.CBPVA481.0
Name CBPV
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/BGDVR196.CBPVA481.0
Table Type File
Licence CC0 1.0 Universal
Description

GeneBank acession numbers for Chronic Bee Paralysis Virus (CBPV)

GeneBank acession numbers for Chronic Bee Paralysis Virus (CBPV)

Metadata

Table 11. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
Genbank submission ID

Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).

Integer number Integer [0.0.NTGER313]

n/a

GenBank accession numbers String pms:genBankAccession [0.0.GNBNK515]

n/a

Submission date

Date of submission to GenBank.

Date iso-8601:calendarDate [0.0.DATEA317]

n/a

Publication date

Not specified by the data provider. Presumably the date at which the GenBank submission was published.

Date iso-8601:calendarDate [0.0.DATEA317]

n/a

Sequence_ID

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Sequenced region

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Size

Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.

Integer number Integer [0.0.NTGER313]

bp

Isolate

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Host

Not specified by the data provider. Presumably the host species from which which the virus has been isolated.

String dwc:scientificName [0.0.SCNTF503]

n/a

Country

Not specified by the data provider. Presumably the name of the region in which the virus has been isolated.

String Text [0.0.TEXTA315]

n/a

Collection_date

Not specified by the data provider. Presumably the approximate date at which the sample has been collected.

String Text [0.0.TEXTA315]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 12. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Genbank submission ID 7 - 7 2,744,926.5 2,743,909 2,743,909 2,743,909 2,746,869 2,746,869 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 6.3% )
GenBank accession numbers 8 - 8 n/a OR582944 n/a n/a n/a OR584328 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 32 ( 100.0% )
Submission date 10 - 10 n/a 2023-09-21 n/a n/a n/a 2023-09-22 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 6.3% )
Publication date 10 - 10 n/a 2023-03-29 n/a n/a n/a 2024-03-28 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 6.3% )
Sequence_ID 9 - 18 n/a 134F3_FR_201… n/a n/a n/a Ruche24_FR_2… 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 32 ( 100.0% )
Sequenced region 4 - 4 n/a RNA1 n/a n/a n/a RdRp 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 6.3% )
Size 3 - 4 2,233.5 700 722 2,566 3,541.5 3,588 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 24 ( 75.0% )
Isolate 5 - 20 n/a 134.0f4 n/a n/a n/a FR20HM439 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 23 ( 71.9% )
Host 14 - 14 n/a Apis mellife… n/a n/a n/a Apis mellife… 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 3.1% )
Country 12 - 28 n/a France: Cent… n/a n/a n/a France: PACA 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 12.5% )
Collection_date 4 - 10 n/a 2014-07-18 n/a n/a n/a 2021-06-15 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 14 ( 43.8% )

Quality measures

Table 13. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Genbank submission ID
100.00%
6.25%
2743909 2746869
GenBank accession numbers
100.00%
100.00%
OR582944 OR582944
Submission date
100.00%
6.25%
2023-09-21 2023-09-22
Publication date
100.00%
6.25%
2024-03-28 2023-03-29
Sequence_ID
100.00%
100.00%
1_FR_2019 1_FR_2019
Sequenced region
100.00%
6.25%
RNA1 RdRp
Size
100.00%
75.00%
2566 3586
Isolate
100.00%
71.88%
134.0f4 CBPV 1.0 f4
Host
100.00%
3.13%
Apis mellifera Apis mellifera
Country
100.00%
12.50%
France: PACA France: Grand-Est
Collection_date
100.00%
43.75%
2016 2020-10-08

Changes made to preparatory file

None

Changes made to data

  1. Missing values (66 occurrences) were replaced by {NULL}.

Unresolved issues

  1. For column Genbank submission ID it may be guessed, but it is not explicitly stated what it describes. In particular, the pathway through which the sequence was submitted should be explicitly specified. The data provider is requested to make this information available.
  2. For column Publication date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  3. For column Sequence_ID it is unclear what it describes. The data provider is requested to make this information available.
  4. For column Sequence region it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  5. For column Size it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  6. For column Isolate it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  7. For column Host it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  8. For column Country it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  9. For column Colection_date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.

DWV

Table 14. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier BGDVR196.DWVAB482.0
Name DWV
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/BGDVR196.DWVAB482.0
Table Type File
Licence CC0 1.0 Universal
Description

GeneBank acession numbers for Deformed Wing Virus (DWV)

Two regions of the genome were targeted to assess DWV diversity and to look for putative recombination events: the 5’ end and the region encompassing genes coding for structural proteins, as well as the Helicase-coding region. All DWV-A and/or DWV-B positive samples (n=116) were sequenced. A long sequencing of about all 5’ half of the virus genome was performed on some samples when possible (n=10).

GeneBank acession numbers for Deformed Wing Virus (DWV)

Two regions of the genome were targeted to assess DWV diversity and to look for putative recombination events: the 5’ end and the region encompassing genes coding for structural proteins, as well as the Helicase-coding region. All DWV-A and/or DWV-B positive samples (n=116) were sequenced. A long sequencing of about all 5’ half of the virus genome was performed on some samples when possible (n=10).

Metadata

Table 15. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
Genbank submission ID

Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).

Integer number Integer [0.0.NTGER313]

n/a

GenBank accession numbers String pms:genBankAccession [0.0.GNBNK515]

n/a

Submission date

Date of submission to GenBank.

Date iso-8601:calendarDate [0.0.DATEA317]

n/a

Publication date

Not specified by the data provider. Presumably the date at which the GenBank submission was published.

Date iso-8601:calendarDate [0.0.DATEA317]

n/a

Sequence_ID

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Sequenced region

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Size

Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.

Integer number Integer [0.0.NTGER313]

bp

Isolate

Not specified by the data provider.

String Text [0.0.TEXTA315]

n/a

Host

Not specified by the data provider. Presumably the host species from which which the virus has been isolated.

String dwc:scientificName [0.0.SCNTF503]

n/a

Country

Not specified by the data provider. Presumably the name of the country in which the virus has been isolated.

String Text [0.0.TEXTA315]

n/a

Collection_date

Not specified by the data provider. Presumably the approximate date at which the sample has been collected.

String Text [0.0.TEXTA315]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 16. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Genbank submission ID 7 - 7 2,734,179.0 2,733,328 2,733,349 2,733,349 2,733,374 2,743,492 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 3.3% )
GenBank accession numbers 0 - 8 n/a OR437235 n/a n/a n/a OR437298 123 59 ( 48.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 65 ( 52.8% )
Submission date 10 - 10 n/a 2023-11-08 n/a n/a n/a 2023-11-09 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 1.6% )
Publication date 10 - 10 n/a 2024-03-18 n/a n/a n/a 2024-12-01 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 1.6% )
Sequence_ID 15 - 22 n/a 2141B_UK_Spr… n/a n/a n/a BG5_ITR1_NL_… 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 121 ( 98.4% )
Sequenced region 3 - 11 n/a 5' complete n/a n/a n/a ITR 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 3.3% )
Size 3 - 4 1,496.6 668 1,124 1,186 1,341 6,450 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 81 ( 65.9% )
Isolate 3 - 5 n/a 2141 n/a n/a n/a BG5 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 66 ( 53.7% )
Host 14 - 14 n/a Apis mellife… n/a n/a n/a Apis mellife… 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 0.8% )
Country 6 - 14 n/a Belgium n/a n/a n/a United Kingd… 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 8 ( 6.5% )
Collection_date 4 - 10 n/a 2020 n/a n/a n/a 2021-11-23 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 33 ( 26.8% )

Quality measures

Table 17. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Genbank submission ID
100.00%
3.25%
2733374 2743492
GenBank accession numbers
52.03%
52.85%
n/a OR437235
Submission date
100.00%
1.63%
2023-11-08 2023-11-09
Publication date
100.00%
1.63%
2024-12-01 2024-03-18
Sequence_ID
100.00%
98.37%
2261_FR_Aut2020 2141B_UK_Spr2020
Sequenced region
100.00%
3.25%
A/B/R 5' complete
Size
100.00%
65.85%
1179 1255
Isolate
100.00%
53.66%
2689 2141
Host
100.00%
0.81%
Apis mellifera Apis mellifera
Country
100.00%
6.50%
France Romania
Collection_date
100.00%
26.83%
2020-10-16 2020-05-02

Changes made to preparatory file

None

Changes made to data

  1. Missing values (307 occurrences) were replaced by {NULL}.

Unresolved issues

  1. For column Genbank submission ID it may be guessed, but it is not explicitly stated what it describes. In particular, the pathway through which the sequence was submitted should be explicitly specified. The data provider is requested to make this information available.
  2. For column Publication date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  3. For column Sequence_ID it is unclear what it describes. The data provider is requested to make this information available.
  4. For column Sequence region it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  5. For column Size it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  6. For column Isolate it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  7. For column Host it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  8. For column Country it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.
  9. For column Colection_date it may be guessed, but it is not explicitly stated what it describes. The data provider is requested to make this information available.

References

  1. Bonjour-Dalmon A. 2023 Genbank Accession numbers for virus sequences. B-GOOD Bee Health Data Portal. [2024-11-2] beehealthdata.org

Annex 1: Table column reports

Table: ABPV

Column: Genbank submission ID

Table 18. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Genbank submission ID
Description

Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 19. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Genbank submission ID 7 - 7 2,743,798.7 2,743,639 2,743,639 2,743,890 2,743,890 2,743,890 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 18.2% )
Table 20. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Genbank submission ID
100.00%
18.18%
2743890 2743639

Data Distribution Top 20

Figure 1. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 2. Distribution of values in the column.

Outliers

Figure 3. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 4. Visualization of completeness of the data in the column.

Uniqueness

Figure 5. Visualization of uniqueness of the data in the column.

Column: GenBank accession numbers

Table 21. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name GenBank accession numbers
Description
Data type String
Descriptor pms:genBankAccession [UID:0.0.GNBNK515]
Descriptor description

The unique identifier for a sequence record. An accession number applies to the complete record and is usually a combination of a letter(s) and numbers, such as a single letter followed by five digits (e.g., U12345) or two letters followed by six digits (e.g., AF123456). Some accessions might be longer, depending on the type of sequence record.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.GNBNK515
Unit

n/a

Table 22. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
GenBank accession numbers 8 - 8 n/a OR540712 n/a n/a n/a OR540722 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 11 ( 100.0% )
Table 23. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
GenBank accession numbers
100.00%
100.00%
OR540712 OR540712

Completeness

Figure 6. Visualization of completeness of the data in the column.

Uniqueness

Figure 7. Visualization of uniqueness of the data in the column.

Column: Submission date

Table 24. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Submission date
Description

Date of submission to GenBank.

Data type Date
Descriptor iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description

particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit

n/a

Table 25. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Submission date 10 - 10 n/a 2023-12-09 n/a n/a n/a 2023-12-09 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 9.1% )
Table 26. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Submission date
100.00%
9.09%
2023-12-09 2023-12-09

Data Distribution Top 20

Figure 8. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 9. Visualization of completeness of the data in the column.

Uniqueness

Figure 10. Visualization of uniqueness of the data in the column.

Column: Publication date

Table 27. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Publication date
Description

Not specified by the data provider. Presumably the date at which the GenBank submission was published.

Data type Date
Descriptor iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description

particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit

n/a

Table 28. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Publication date 10 - 10 n/a 2024-03-18 n/a n/a n/a 2024-03-18 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 9.1% )
Table 29. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Publication date
100.00%
9.09%
2024-03-18 2024-03-18

Data Distribution Top 20

Figure 11. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 12. Visualization of completeness of the data in the column.

Uniqueness

Figure 13. Visualization of uniqueness of the data in the column.

Column: Sequence_ID

Table 30. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Sequence_ID
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 31. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Sequence_ID 20 - 24 n/a 2247_UK_Aut2… n/a n/a n/a BG125-20_RO_… 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 11 ( 100.0% )
Table 32. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Sequence_ID
100.00%
100.00%
2247_UK_Aut2020_3ORF 2247_UK_Aut2020_3ORF

Completeness

Figure 14. Visualization of completeness of the data in the column.

Uniqueness

Figure 15. Visualization of uniqueness of the data in the column.

Column: Sequenced region

Table 33. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Sequenced region
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 34. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Sequenced region 5 - 5 n/a 3'ORF n/a n/a n/a 5'ORF 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 18.2% )
Table 35. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Sequenced region
100.00%
18.18%
5'ORF 3'ORF

Data Distribution Top 20

Figure 16. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 17. Visualization of completeness of the data in the column.

Uniqueness

Figure 18. Visualization of uniqueness of the data in the column.

Column: Size

Table 36. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Size
Description

Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

bp

Table 37. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Size 3 - 3 796.9 777 789 796 808 815 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 10 ( 90.9% )
Table 38. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Size
100.00%
90.91%
789 798

Continuous Data Distribution

Figure 19. Distribution of values in the column.

Outliers

Figure 20. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 21. Visualization of completeness of the data in the column.

Uniqueness

Figure 22. Visualization of uniqueness of the data in the column.

Column: Isolate

Table 39. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Isolate
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 40. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Isolate 4 - 8 n/a 2247 n/a n/a n/a BG125-20 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 7 ( 63.6% )
Table 41. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Isolate
100.00%
63.64%
2247 2253

Completeness

Figure 23. Visualization of completeness of the data in the column.

Uniqueness

Figure 24. Visualization of uniqueness of the data in the column.

Column: Host

Table 42. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Host
Description

Not specified by the data provider. Presumably the host species from which which the virus has been isolated.

Data type String
Descriptor dwc:scientificName [UID:0.0.SCNTF503]
Descriptor description

The full scientific name, with authorship and date information if known.

Descriptor target IRI http://rs.tdwg.org/dwc/terms/scientificName
Unit

n/a

Table 43. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Host 14 - 14 n/a Apis mellife… n/a n/a n/a Apis mellife… 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 9.1% )
Table 44. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Host
100.00%
9.09%
Apis mellifera Apis mellifera

Data Distribution Top 20

Figure 25. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 26. Visualization of completeness of the data in the column.

Uniqueness

Figure 27. Visualization of uniqueness of the data in the column.

Column: Country

Table 45. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Country
Description

Not specified by the data provider. Presumably the name of the region in which the virus has been isolated.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 46. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Country 6 - 14 n/a France n/a n/a n/a United Kingd… 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 27.3% )
Table 47. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Country
100.00%
27.27%
United Kingdom Romania

Data Distribution Top 20

Figure 28. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 29. Visualization of completeness of the data in the column.

Uniqueness

Figure 30. Visualization of uniqueness of the data in the column.

Column: Collection_date

Table 48. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Collection_date
Description

Not specified by the data provider. Presumably the approximate date at which the sample has been collected.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 49. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Collection_date 10 - 10 n/a 2020-05-22 n/a n/a n/a 2021-02-03 11 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 5 ( 45.5% )
Table 50. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Collection_date
100.00%
45.45%
2020-05-22 2020-12-09

Completeness

Figure 31. Visualization of completeness of the data in the column.

Uniqueness

Figure 32. Visualization of uniqueness of the data in the column.

Table: CBPV

Column: Genbank submission ID

Table 51. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Genbank submission ID
Description

Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 52. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Genbank submission ID 7 - 7 2,744,926.5 2,743,909 2,743,909 2,743,909 2,746,869 2,746,869 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 6.3% )
Table 53. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Genbank submission ID
100.00%
6.25%
2743909 2746869

Data Distribution Top 20

Figure 33. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 34. Distribution of values in the column.

Outliers

Figure 35. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 36. Visualization of completeness of the data in the column.

Uniqueness

Figure 37. Visualization of uniqueness of the data in the column.

Column: GenBank accession numbers

Table 54. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name GenBank accession numbers
Description
Data type String
Descriptor pms:genBankAccession [UID:0.0.GNBNK515]
Descriptor description

The unique identifier for a sequence record. An accession number applies to the complete record and is usually a combination of a letter(s) and numbers, such as a single letter followed by five digits (e.g., U12345) or two letters followed by six digits (e.g., AF123456). Some accessions might be longer, depending on the type of sequence record.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.GNBNK515
Unit

n/a

Table 55. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
GenBank accession numbers 8 - 8 n/a OR582944 n/a n/a n/a OR584328 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 32 ( 100.0% )
Table 56. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
GenBank accession numbers
100.00%
100.00%
OR582944 OR582944

Completeness

Figure 38. Visualization of completeness of the data in the column.

Uniqueness

Figure 39. Visualization of uniqueness of the data in the column.

Column: Submission date

Table 57. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Submission date
Description

Date of submission to GenBank.

Data type Date
Descriptor iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description

particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit

n/a

Table 58. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Submission date 10 - 10 n/a 2023-09-21 n/a n/a n/a 2023-09-22 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 6.3% )
Table 59. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Submission date
100.00%
6.25%
2023-09-21 2023-09-22

Data Distribution Top 20

Figure 40. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 41. Visualization of completeness of the data in the column.

Uniqueness

Figure 42. Visualization of uniqueness of the data in the column.

Column: Publication date

Table 60. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Publication date
Description

Not specified by the data provider. Presumably the date at which the GenBank submission was published.

Data type Date
Descriptor iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description

particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit

n/a

Table 61. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Publication date 10 - 10 n/a 2023-03-29 n/a n/a n/a 2024-03-28 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 6.3% )
Table 62. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Publication date
100.00%
6.25%
2024-03-28 2023-03-29

Data Distribution Top 20

Figure 43. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 44. Visualization of completeness of the data in the column.

Uniqueness

Figure 45. Visualization of uniqueness of the data in the column.

Column: Sequence_ID

Table 63. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Sequence_ID
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 64. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Sequence_ID 9 - 18 n/a 134F3_FR_201… n/a n/a n/a Ruche24_FR_2… 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 32 ( 100.0% )
Table 65. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Sequence_ID
100.00%
100.00%
1_FR_2019 1_FR_2019

Completeness

Figure 46. Visualization of completeness of the data in the column.

Uniqueness

Figure 47. Visualization of uniqueness of the data in the column.

Column: Sequenced region

Table 66. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Sequenced region
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 67. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Sequenced region 4 - 4 n/a RNA1 n/a n/a n/a RdRp 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 6.3% )
Table 68. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Sequenced region
100.00%
6.25%
RNA1 RdRp

Data Distribution Top 20

Figure 48. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 49. Visualization of completeness of the data in the column.

Uniqueness

Figure 50. Visualization of uniqueness of the data in the column.

Column: Size

Table 69. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Size
Description

Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

bp

Table 70. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Size 3 - 4 2,233.5 700 722 2,566 3,541.5 3,588 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 24 ( 75.0% )
Table 71. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Size
100.00%
75.00%
2566 3586

Continuous Data Distribution

Figure 51. Distribution of values in the column.

Outliers

Figure 52. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 53. Visualization of completeness of the data in the column.

Uniqueness

Figure 54. Visualization of uniqueness of the data in the column.

Column: Isolate

Table 72. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Isolate
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 73. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Isolate 5 - 20 n/a 134.0f4 n/a n/a n/a FR20HM439 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 23 ( 71.9% )
Table 74. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Isolate
100.00%
71.88%
134.0f4 CBPV 1.0 f4

Completeness

Figure 55. Visualization of completeness of the data in the column.

Uniqueness

Figure 56. Visualization of uniqueness of the data in the column.

Column: Host

Table 75. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Host
Description

Not specified by the data provider. Presumably the host species from which which the virus has been isolated.

Data type String
Descriptor dwc:scientificName [UID:0.0.SCNTF503]
Descriptor description

The full scientific name, with authorship and date information if known.

Descriptor target IRI http://rs.tdwg.org/dwc/terms/scientificName
Unit

n/a

Table 76. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Host 14 - 14 n/a Apis mellife… n/a n/a n/a Apis mellife… 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 3.1% )
Table 77. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Host
100.00%
3.13%
Apis mellifera Apis mellifera

Data Distribution Top 20

Figure 57. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 58. Visualization of completeness of the data in the column.

Uniqueness

Figure 59. Visualization of uniqueness of the data in the column.

Column: Country

Table 78. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Country
Description

Not specified by the data provider. Presumably the name of the region in which the virus has been isolated.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 79. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Country 12 - 28 n/a France: Cent… n/a n/a n/a France: PACA 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 12.5% )
Table 80. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Country
100.00%
12.50%
France: PACA France: Grand-Est

Data Distribution Top 20

Figure 60. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 61. Visualization of completeness of the data in the column.

Uniqueness

Figure 62. Visualization of uniqueness of the data in the column.

Column: Collection_date

Table 81. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Collection_date
Description

Not specified by the data provider. Presumably the approximate date at which the sample has been collected.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 82. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Collection_date 4 - 10 n/a 2014-07-18 n/a n/a n/a 2021-06-15 32 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 14 ( 43.8% )
Table 83. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Collection_date
100.00%
43.75%
2016 2020-10-08

Completeness

Figure 63. Visualization of completeness of the data in the column.

Uniqueness

Figure 64. Visualization of uniqueness of the data in the column.

Table: DWV

Column: Genbank submission ID

Table 84. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Genbank submission ID
Description

Not specified by the data provider. Presumably the identifier of the pathway through which the sequence was submitted (direct submission to NCBI).

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 85. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Genbank submission ID 7 - 7 2,734,179.0 2,733,328 2,733,349 2,733,349 2,733,374 2,743,492 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 3.3% )
Table 86. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Genbank submission ID
100.00%
3.25%
2733374 2743492

Data Distribution Top 20

Figure 65. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 66. Distribution of values in the column.

Outliers

Figure 67. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 68. Visualization of completeness of the data in the column.

Uniqueness

Figure 69. Visualization of uniqueness of the data in the column.

Column: GenBank accession numbers

Table 87. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name GenBank accession numbers
Description
Data type String
Descriptor pms:genBankAccession [UID:0.0.GNBNK515]
Descriptor description

The unique identifier for a sequence record. An accession number applies to the complete record and is usually a combination of a letter(s) and numbers, such as a single letter followed by five digits (e.g., U12345) or two letters followed by six digits (e.g., AF123456). Some accessions might be longer, depending on the type of sequence record.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.GNBNK515
Unit

n/a

Table 88. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
GenBank accession numbers 0 - 8 n/a OR437235 n/a n/a n/a OR437298 123 59 ( 48.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 65 ( 52.8% )
Table 89. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
GenBank accession numbers
52.03%
52.85%
n/a OR437235

Completeness

Figure 70. Visualization of completeness of the data in the column.

Uniqueness

Figure 71. Visualization of uniqueness of the data in the column.

Column: Submission date

Table 90. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Submission date
Description

Date of submission to GenBank.

Data type Date
Descriptor iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description

particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit

n/a

Table 91. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Submission date 10 - 10 n/a 2023-11-08 n/a n/a n/a 2023-11-09 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 1.6% )
Table 92. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Submission date
100.00%
1.63%
2023-11-08 2023-11-09

Data Distribution Top 20

Figure 72. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 73. Visualization of completeness of the data in the column.

Uniqueness

Figure 74. Visualization of uniqueness of the data in the column.

Column: Publication date

Table 93. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Publication date
Description

Not specified by the data provider. Presumably the date at which the GenBank submission was published.

Data type Date
Descriptor iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description

particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit

n/a

Table 94. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Publication date 10 - 10 n/a 2024-03-18 n/a n/a n/a 2024-12-01 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 1.6% )
Table 95. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Publication date
100.00%
1.63%
2024-12-01 2024-03-18

Data Distribution Top 20

Figure 75. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 76. Visualization of completeness of the data in the column.

Uniqueness

Figure 77. Visualization of uniqueness of the data in the column.

Column: Sequence_ID

Table 96. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Sequence_ID
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 97. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Sequence_ID 15 - 22 n/a 2141B_UK_Spr… n/a n/a n/a BG5_ITR1_NL_… 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 121 ( 98.4% )
Table 98. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Sequence_ID
100.00%
98.37%
2261_FR_Aut2020 2141B_UK_Spr2020

Completeness

Figure 78. Visualization of completeness of the data in the column.

Uniqueness

Figure 79. Visualization of uniqueness of the data in the column.

Column: Sequenced region

Table 99. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Sequenced region
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 100. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Sequenced region 3 - 11 n/a 5' complete n/a n/a n/a ITR 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 4 ( 3.3% )
Table 101. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Sequenced region
100.00%
3.25%
A/B/R 5' complete

Data Distribution Top 20

Figure 80. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 81. Visualization of completeness of the data in the column.

Uniqueness

Figure 82. Visualization of uniqueness of the data in the column.

Column: Size

Table 102. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Size
Description

Not specified by the data provider. Presumably the size coding sequence submitted to GenBank.

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

bp

Table 103. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Size 3 - 4 1,496.6 668 1,124 1,186 1,341 6,450 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 81 ( 65.9% )
Table 104. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Size
100.00%
65.85%
1179 1255

Continuous Data Distribution

Figure 83. Distribution of values in the column.

Outliers

Figure 84. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 85. Visualization of completeness of the data in the column.

Uniqueness

Figure 86. Visualization of uniqueness of the data in the column.

Column: Isolate

Table 105. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Isolate
Description

Not specified by the data provider.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 106. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Isolate 3 - 5 n/a 2141 n/a n/a n/a BG5 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 66 ( 53.7% )
Table 107. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Isolate
100.00%
53.66%
2689 2141

Completeness

Figure 87. Visualization of completeness of the data in the column.

Uniqueness

Figure 88. Visualization of uniqueness of the data in the column.

Column: Host

Table 108. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Host
Description

Not specified by the data provider. Presumably the host species from which which the virus has been isolated.

Data type String
Descriptor dwc:scientificName [UID:0.0.SCNTF503]
Descriptor description

The full scientific name, with authorship and date information if known.

Descriptor target IRI http://rs.tdwg.org/dwc/terms/scientificName
Unit

n/a

Table 109. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Host 14 - 14 n/a Apis mellife… n/a n/a n/a Apis mellife… 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1 ( 0.8% )
Table 110. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Host
100.00%
0.81%
Apis mellifera Apis mellifera

Data Distribution Top 20

Figure 89. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 90. Visualization of completeness of the data in the column.

Uniqueness

Figure 91. Visualization of uniqueness of the data in the column.

Column: Country

Table 111. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Country
Description

Not specified by the data provider. Presumably the name of the country in which the virus has been isolated.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 112. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Country 6 - 14 n/a Belgium n/a n/a n/a United Kingd… 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 8 ( 6.5% )
Table 113. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Country
100.00%
6.50%
France Romania

Data Distribution Top 20

Figure 92. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 93. Visualization of completeness of the data in the column.

Uniqueness

Figure 94. Visualization of uniqueness of the data in the column.

Column: Collection_date

Table 114. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name Collection_date
Description

Not specified by the data provider. Presumably the approximate date at which the sample has been collected.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 115. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
Collection_date 4 - 10 n/a 2020 n/a n/a n/a 2021-11-23 123 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 33 ( 26.8% )
Table 116. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
Collection_date
100.00%
26.83%
2020-10-16 2020-05-02

Completeness

Figure 95. Visualization of completeness of the data in the column.

Uniqueness

Figure 96. Visualization of uniqueness of the data in the column.