EU Pollinator Hub BEELIFE EUROPEAN
BEEKEEPING COORDINATION

Avenue Louise 209/7, 1050 Brussels, Belgium
info@pollinatorhub.eu • www.pollinatorhub.eu • +32 (0) 486 973 920
Dataset Report
UID: BGDBC176.0.0
Name: B-GOOD Bee Counter Data
Title: Dataset from the B-GOOD project, containing data from bee counters on daily exits and entrances of bees from colonies.
Status: Approved
Version: v. 1.0
Date: 2024-11-21
Author: Rubinigg Michael
Citation proposal:
Rubinigg M. 2024 Report of dataset B-GOOD Bee Counter Data, v. 1.0 [BGDBC176.0.0]. EU Pollinator Hub. [2025-03-30] app.pollinatorhub.eu
Compliance with FAIR* principles
Findable
Accessible
Interoperable
Reusable
See https://www.go-fair.org/fair-principles for more information about FAIR principles
Data Quality
Requires major revision
This document is intended for use by collaborators of the EU Pollinator Hub and may be passed on with the express permission of the leader of the consortium and for the purpose determined by the leader of the consortium.

Document History

Release

Version v. 1.0 released on 2025-03-30. Written by Rubinigg Michael. Reviewed by Rubinigg Michael.

Revision

Table 1. List of revisions made to the document. Identifier of revision (No); date of revision (Date); description of revision (Description); reason for revision (Reason).
No Date Description Reason
1 2025-03-30 14:03:12 Initial release. N.A.

Abbreviations

EU
European Union
EUPH
EU Pollinator Hub
FAIR
Findability, Accessibility, Interoperability, and Reuse of digital assets
INRAE
Institut national de recherche pour l’agriculture, l’alimentation et l’environnement (France's National Research Institute for Agriculture, Food and Environment)
n.a.
not available

Executive Summary

Data overview:

The data was published by Alaux CA (INRAE) on the B-GOOD Bee Health Data Portal as part of the B-GOOD project (grant agreement 817622), funded under the EU Horizon 2020 Research and Innovation Programme. It contains records of daily number of exits and entrances of bees from 6 colonies per location in 2023 in the locations Avignon/France (31.03.2023 – 15.11.2023), Halle/Germany (2023-05-17 – 2023-11-15) and Gent/Belgium (2023-06-02 – 2023-11-15).

Data value:

The objectives of the B-GOOD project were: (1) Facilitate decision making for beekeepers and other stakeholders by establishing ready-to-use tools for operationalising the HSI; (2) Test, standardise and validate methods for measuring and reporting selected indicators affecting bee health; (3) Explore the various socio-economic and ecological factors beyond bee health; (4) Foster an EU community to collect and share knowledge related to honey bees and their environment; (5) Engender a lasting learning and innovation system (LIS); (6) Minimise the impact of biotic and abiotic stressors.

Data description:

The dataset consists of one table (591,97 KB), which contains 6.960 records.

Data application:

Currently, the data integrated from the B-GOOD Bee Health Data Portal contains maior issues and does not comply with the FAIR Guiding Principles for scientific data management and stewardship applied on the EU Pollinator Hub. More descriptive information about the context, quality and condition, or characteristics of the data (e.g. protocols, measurement devices used, units of the captured data, or any other details about the study) must be provided. More metadata in the form of accurate and relevant attributes (*e.g. *metadata that describes the scope of the data has been described, any particularities or limitations about the data that other users should be aware of, specification of the date of generation/collection of the data, the lab conditions, who prepared the data, the parameter settings, the name and version of the software used, specification of whether it is raw or processed data, explanation of all variable names are explained if they are not self-explanatory) must be provided. It requires major revisions by the data provider.

Introduction

The data was published by Alaux CA (INRAE) on the B-GOOD Bee Health Data Portal as part of the B-GOOD project (grant agreement 817622), funded under the EU Horizon 2020 Research and Innovation Programme. The dataset contains records of daily number of exits and entrances of bees from 6 colonies per location in 2023 in the locations Avignon/France (31.03.2023 – 15.11.2023), Halle/Germany (17.05.2023 – 15.11.23) and Gent/Belgium (02.06.2023 – 15.11.23).

The objectives of the B-GOOD project were: (1) Facilitate decision making for beekeepers and other stakeholders by establishing ready-to-use tools for operationalising the HSI; (2) Test, standardise and validate methods for measuring and reporting selected indicators affecting bee health; (3) Explore the various socio-economic and ecological factors beyond bee health; (4) Foster an EU community to collect and share knowledge related to honey bees and their environment; (5) Engender a lasting learning and innovation system (LIS); (6) Minimise the impact of biotic and abiotic stressors.

Material and Methods

Data Acquisition

All raw data files were downloaded from the B-GOOD Bee Health Data Portal on 2024-09-23 18:47:55.

List of raw data obtained from the data provider.

  1. File bee-counters-exen-daily-counts, accessed on 2024-09-23 18:47:55, provided by B-GOOD Bee Health Data Portal

Metadata was obtained from the web pages of the dataset's web page.

Table 2. List of raw data and metadata files included in the dataset. Identifier of table row (No); name of the file (File); the type of the file (Type); file contains data (D); file contains metadata (M); date of upload of the file to the EU Pollinator Hub (Arrival); number of data points contained within the file (if applicable); uploaded file size.
No File Type D M Arrival Data points File size
1 b-good bee counter data_PREP_MR_240923.csv CSV - Comma seperated values Yes No 2024-09-23 15:09:18 62,640 591.97 KiB

Data Preparation

The file in the zip-archive was extracted using File Explorer (Microsoft Corporation, version 22H2).

The file GOOD_Virus_Data_2018_2020 Pool size.xlsx was processed with MS Excel (Microsoft Corporation, version 2409). Column date was separated into column time and date and correctly formated to ISO 8601 format and the additional columns city, NUTS3, device, device_ID and direction were inserted and filled with the appropriate information. The worksheets were then exported to separate data files in CSV format (UTF-8 encoding) and imported into Notepad++ (version 8.7) where missing values were substituted by {NULL} using regular expressions. All data files were then merged using the Python script MergeCsv.py. Finally, the merged data file was unpivoted using the Python script UnpivotToCsv.py.

Data was then exported to the respective preparatory files and uploaded to the EU Pollinator Hub according to SOP-017 (Dataset integration.

Data Validation

No data validation was performed.

Data Analysis

No data analysis was performed.

Data Description

Dataset

Table 3. Summary of tables belonging to the dataset. Table row identifier (No); name of the table (Table); description of the table (Description).
No Table Description
1 bee counter data Daily number of exists and entrance of bees at colony level. The structure of the table provided by the B-GOOD…
Table 4. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
UID BGDBC176.0.0
Name B-GOOD Bee Counter Data
Title Dataset from the B-GOOD project, containing data from bee counters on daily exits and entrances of bees from colonies.
IRI https://app.pollinatorhub.eu/dataset-discovery/BGDBC176.0.0
Licence CC BY-NC-ND 4.0
DOI n/a
Creation date 2024-09-23
Publishing date 2025-03-17
Contact information n/a
Keywords Apis mellifera, bee counter, beehive, honey bee
Data collection years 2023
Regions, the data was collected in Arr. Gent, Halle (Saale), Kreisfreie Stadt, Vaucluse
Description

The dataset contains records of daily number of exits and entrances of bees from 6 colonies per location in 2023 in the locations Avignon/France (2023-03-31 – 2023-05-15), Halle/Germany (2023-05-17 – 2023-11-15) and Gent/Belgium (2023-06-02 – 2023-11-15). It was published by Alaux CA (INRAE) on the B-GOOD Bee Health Data Portal as part of the B-GOOD project (grant agreement 817622), funded under the EU Horizon 2020 Research and Innovation Programme.

Table 5. Standardised metadata of the data provider B-GOOD Bee Health Data Portal. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name B-GOOD Bee Health Data Portal
URL
Acronym B-GOOD
IRI https://app.pollinatorhub.eu/data-providers/b-good-bee-health-data-portal
Address https://b-good-project.eu
Country Belgium
Contact information b-good-project.eu
Description

Project funded by the EU Horizon 2020 Research and Innovation Programme under grant agreement No 817622. Project website: https://b-good-project.eu

Tables

bee counter data

Table 6. Standardised metadata of the table. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
UID BGDBC176.BCNTR361.0
Name bee counter data
IRI https://app.pollinatorhub.eu/dataset-discovery/parts/BGDBC176.BCNTR361.0
Type File
Licence CC BY-NC-ND 4.0
Description

Table bee counter data contains 6.960 records of daily number of exits and entrances of bees from 6 colonies in the locations INRAE (Avignon, France), measured from 2023-03-31 to 2023-05-15), HALLE (Halle, Germany), measured from 2023-05-17 to 2023-11-15 and GENT (Gent, Belgium), measured from 2023-06-02 to 2023-11-15. For location INRAE, HALLE and GENT there are 2.760, 2.196 and 2.004 records, respectively. A total of 138 values for total number of bees were missing, 114 records in location INRAE, 18 records in location HALLE and 6 records in location GENT. The analysis of outliers revealed extreme values in the number of bees entering and exiting the hive greater than 340.000 day-1 (up to 1.805.380 day-1), in device CPT_0651 (12 records, measured on 2023-09-27 and from 2023-11-11 to 2023-11-15) and in device CPT_7680 (2 records) measured on 2023-08-20, both placed in Halle.

Table bee counter data contains 6.960 records of daily number of exits and entrances of bees from 6 colonies in the locations INRAE (Avignon, France), measured from 2023-03-31 to 2023-05-15), HALLE (Halle, Germany), measured from 2023-05-17 to 2023-11-15 and GENT (Gent, Belgium), measured from 2023-06-02 to 2023-11-15. For location INRAE, HALLE and GENT there are 2.760, 2.196 and 2.004 records, respectively. A total of 138 values for total number of bees were missing, 114 records in location INRAE, 18 records in location HALLE and 6 records in location GENT. The analysis of outliers revealed extreme values in the number of bees entering and exiting the hive greater than 340.000 day-1 (up to 1.805.380 day-1), in device CPT_0651 (12 records, measured on 2023-09-27 and from 2023-11-11 to 2023-11-15) and in device CPT_7680 (2 records) measured on 2023-08-20, both placed in Halle.

Metadata

n/a
Table 7. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Name Description Data type Descriptor Unit
location_name

Name of the location, as assigned by the data owner, in which the values provided in the column number were measured.

String Text [0.0.TEXTA315]

n/a

city

Name of the city where the honey bee colonies, in which the values provided in the column number were measured, were located.

String Text [0.0.TEXTA315]

n/a

NUTS3

NUTS level 3 code of the region where the honey bee colonies, in which the values provided in the column number were measured, were located.

String nuts2021Code [0.0.NTSCD55]

n/a

date

Calendar date extracted from the colmun date in the raw data file of the data provider. Its exact meaning is not specified by the data provider. Presumably the calendar date on which the values provided in the column number were measured.

Date calendarDate [0.0.DATEA317]

n/a

time

Local time of day extracted from the colmun date in the raw data of the data provider. Its exact meaning is not specified by the data provider. Presumably the Local time of day on which the values were transmitted by the remote sensing device.

Time localTimeOfDay [0.0.TMFDY464]

n/a

device

Name of the device, used as header titles in the raw data file of the data provider, which was used to to measure the values provided in the column number.

String Text [0.0.TEXTA315]

n/a

device_ID

Identifier of the device, extracted from the column device, which was used to measure the values provided in the column number.

String Text [0.0.TEXTA315]

n/a

direction

Direction of movements of bees at the hive entrance ({in}: bees entering the beehive; {out}: bees leaving the beehive), extracted from the column device.

String Text [0.0.TEXTA315]

n/a

number

Number of individual honey bees entering or leaving the beehive per day, as specified in column direction.

Integer number Integer [0.0.NTGER313]

no. day-1

Metadata of individual tables can be found in Annex 1.

Descriptive Measures

Table 8. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
location_name 4 - 5 n/a GENT n/a n/a n/a INRA 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 0.0% )
city 4 - 7 n/a Avignon n/a n/a n/a Halle 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 0.0% )
NUTS3 5 - 5 n/a BE234 n/a n/a n/a FRL06 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 0.0% )
date 10 - 10 n/a 2023-03-31 n/a n/a n/a 2023-11-15 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 230 ( 3.3% )
time 5 - 5 n/a 01:00 n/a n/a n/a 02:00 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 0.0% )
device 29 - 33 n/a CPT_0255 - N… n/a n/a n/a CPT_9976 - N… 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 36 ( 0.5% )
device_ID 8 - 8 n/a CPT_0255 n/a n/a n/a CPT_9976 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 18 ( 0.3% )
direction 2 - 3 n/a in n/a n/a n/a out 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 0.0% )
number 1 - 7 58,308.4 0 8,673.75 40,549.5 89,229.5 1,805,380 6,960 138 ( 2.0% ) 33 ( 0.5% ) 0 ( 0.0% ) 6,350 ( 91.2% )

Quality Measures

Table 9. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
location_name
100.00%
0.04%
INRA GENT
city
100.00%
0.04%
Avignon Gent
NUTS3
100.00%
0.04%
FRL06 BE234
date
100.00%
3.30%
2023-11-15 2023-05-16
time
100.00%
0.03%
02:00 01:00
device
100.00%
0.52%
CPT_2481 - Nb entrances (per day) CPT_0255 - Nb entrances (per day)
device_ID
100.00%
0.26%
CPT_2481 CPT_0255
direction
100.00%
0.03%
in in
number
98.02%
91.24%
0 954

Changes made to preparatory file

  1. Columns city and NUTS3 were inserted. Values in the columns were derived from the metadata provided by the data owner on the B-GOOD Bee Health Data Portal.
  2. Colum time was inserted. Values in columns date and time were derived from the datetime in column date in the raw data provided on the B-GOOD Bee Health Data Portal. The meaning of the datetime was not specified by the data provider but there are good reasons to assume that this is the date and time of data transmission by the remote sensors, which did not occur at the same time in all records (either 01:00 or 02:00 am) . In order to facilitate automated data processing it has been decided to store date and time separately, as the exact time of transmission might be irrelevant.
  3. Column device was inserted. Values in column device were derived from the header of the values acquired with the respective device in the raw data provided on the B-GOOD Bee Health Data Portal].
  4. Column device_ID was inserted. Values in column device_ID were derived from the assumed device name contained in column device.
  5. Column direction was inserted. Values in column direction were derived from the direction of movements of bees at the hive entrance ({in}: bees entering the beehive; {out}: bees leaving the beehive) extracted from the column device.

Changes made to data

  1. In 138 records in which column number contained missing data (114 records for location INRAE, 18 records for location HALLE, 6 records for location GENT) it was replaced by {NULL}.

Unresolved issues

  1. Due to the missing description of the data it is unclear what the time provided with the date in the raw data refers to. The data provider is requested to make this information available.
  2. It is unclear, whether the number of bees entering or leaving the beehive reported in the raw data refer to the date provided in column date or to the previous day, given that the time provided with the date (either 01:00 or 02:00) is often used as transmission time for remote devices. The data provider is requested to make this information available.
  3. Extreme outliers have been found in the number of bees entering and exiting the hive, which should be further investigated by the data provider.

References

  1. Alaux CA. Bee Health Data Portal - Dataset. [2024-9-23] beehealthdata.org
  2. Anonymous GO FAIR initiative: Make your data & services FAIR. (en-US) GO FAIR. [2024-10-1] www.go-fair.org

Annex 1: Table column reports

bee counter data

location_name

Table 10. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name location_name
Description

Name of the location, as assigned by the data owner, in which the values provided in the column number were measured.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 11. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
location_name 4 - 5 n/a GENT n/a n/a n/a INRA 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 0.0% )
Table 12. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
location_name
100.00%
0.04%
INRA GENT

Data Distribution Top 20

Figure 1. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 2. Visualization of completeness of the data in the column.

Uniqueness

Figure 3. Visualization of uniqueness of the data in the column.

city

Table 13. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name city
Description

Name of the city where the honey bee colonies, in which the values provided in the column number were measured, were located.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 14. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
city 4 - 7 n/a Avignon n/a n/a n/a Halle 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 0.0% )
Table 15. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
city
100.00%
0.04%
Avignon Gent

Data Distribution Top 20

Figure 4. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 5. Visualization of completeness of the data in the column.

Uniqueness

Figure 6. Visualization of uniqueness of the data in the column.

NUTS3

Table 16. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name NUTS3
Description

NUTS level 3 code of the region where the honey bee colonies, in which the values provided in the column number were measured, were located.

Data type String
Descriptor eurostat:nuts2021Code [UID:0.0.NTSCD55]
Descriptor description

A NUTS code defined in the NUTS classification 2021, valid from 2021-01-01 to 2023-12-31, containing 92 regions at NUTS level 1, 244 regions at NUTS level 2 and 1165 regions at NUTS level 3 level.

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTSCD55
Unit

n/a

Table 17. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
NUTS3 5 - 5 n/a BE234 n/a n/a n/a FRL06 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 3 ( 0.0% )
Table 18. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
NUTS3
100.00%
0.04%
FRL06 BE234

Data Distribution Top 20

Figure 7. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 8. Visualization of completeness of the data in the column.

Uniqueness

Figure 9. Visualization of uniqueness of the data in the column.

date

Table 19. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name date
Description

Calendar date extracted from the colmun date in the raw data file of the data provider. Its exact meaning is not specified by the data provider. Presumably the calendar date on which the values provided in the column number were measured.

Data type Date
Descriptor iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description

particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit

n/a

Table 20. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
date 10 - 10 n/a 2023-03-31 n/a n/a n/a 2023-11-15 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 230 ( 3.3% )
Table 21. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
date
100.00%
3.30%
2023-11-15 2023-05-16

Completeness

Figure 10. Visualization of completeness of the data in the column.

Uniqueness

Figure 11. Visualization of uniqueness of the data in the column.

time

Table 22. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name time
Description

Local time of day extracted from the colmun date in the raw data of the data provider. Its exact meaning is not specified by the data provider. Presumably the Local time of day on which the values were transmitted by the remote sensing device.

Data type Time
Descriptor iso-8601:localTimeOfDay [UID:0.0.TMFDY464]
Descriptor description

time of day [...] in a local time scale [...]

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TMFDY464
Unit

n/a

Table 23. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
time 5 - 5 n/a 01:00 n/a n/a n/a 02:00 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 0.0% )
Table 24. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
time
100.00%
0.03%
02:00 01:00

Data Distribution Top 20

Figure 12. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 13. Visualization of completeness of the data in the column.

Uniqueness

Figure 14. Visualization of uniqueness of the data in the column.

device

Table 25. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name device
Description

Name of the device, used as header titles in the raw data file of the data provider, which was used to to measure the values provided in the column number.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 26. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
device 29 - 33 n/a CPT_0255 - N… n/a n/a n/a CPT_9976 - N… 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 36 ( 0.5% )
Table 27. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
device
100.00%
0.52%
CPT_2481 - Nb entrances (per day) CPT_0255 - Nb entrances (per day)

Data Distribution Top 20

Figure 15. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 16. Distribution of 20 least common values, from lowest to highest.

Completeness

Figure 17. Visualization of completeness of the data in the column.

Uniqueness

Figure 18. Visualization of uniqueness of the data in the column.

device_ID

Table 28. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name device_ID
Description

Identifier of the device, extracted from the column device, which was used to measure the values provided in the column number.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 29. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
device_ID 8 - 8 n/a CPT_0255 n/a n/a n/a CPT_9976 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 18 ( 0.3% )
Table 30. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
device_ID
100.00%
0.26%
CPT_2481 CPT_0255

Data Distribution Top 20

Figure 19. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 20. Visualization of completeness of the data in the column.

Uniqueness

Figure 21. Visualization of uniqueness of the data in the column.

direction

Table 31. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name direction
Description

Direction of movements of bees at the hive entrance ({in}: bees entering the beehive; {out}: bees leaving the beehive), extracted from the column device.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 32. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
direction 2 - 3 n/a in n/a n/a n/a out 6,960 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2 ( 0.0% )
Table 33. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
direction
100.00%
0.03%
in in

Data Distribution Top 20

Figure 22. Distribution of 20 most common values, from highest to lowest.

Completeness

Figure 23. Visualization of completeness of the data in the column.

Uniqueness

Figure 24. Visualization of uniqueness of the data in the column.

number

Table 34. Standardised metadata of the column. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name number
Description

Number of individual honey bees entering or leaving the beehive per day, as specified in column direction.

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

no. day-1

Table 35. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Name Length Mean Min Q1 Median Q3 Max Total Missing Zero Blank Distinct
number 1 - 7 58,308.4 0 8,673.75 40,549.5 89,229.5 1,805,380 6,960 138 ( 2.0% ) 33 ( 0.5% ) 0 ( 0.0% ) 6,350 ( 91.2% )
Table 36. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Name Completeness Uniqueness Most Common Value Least Common Value
number
98.02%
91.24%
0 954

Continuous Data Distribution

Figure 25. Distribution of values in the column.

Outliers

Figure 26. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 27. Visualization of completeness of the data in the column.

Uniqueness

Figure 28. Visualization of uniqueness of the data in the column.

Changes made to preparatory file

  1. Columns city and NUTS3 were inserted. Values in the columns were derived from the metadata provided by the data owner on the B-GOOD Bee Health Data Portal.
  2. Colum time was inserted. Values in columns date and time were derived from the datetime in column date in the raw data provided on the B-GOOD Bee Health Data Portal. The meaning of the datetime was not specified by the data provider but there are good reasons to assume that this is the date and time of data transmission by the remote sensors, which did not occur at the same time in all records (either 01:00 or 02:00 am) . In order to facilitate automated data processing it has been decided to store date and time separately, as the exact time of transmission might be irrelevant.
  3. Column device was inserted. Values in column device were derived from the header of the values acquired with the respective device in the raw data provided on the B-GOOD Bee Health Data Portal].
  4. Column device_ID was inserted. Values in column device_ID were derived from the assumed device name contained in column device.
  5. Column direction was inserted. Values in column direction were derived from the direction of movements of bees at the hive entrance ({in}: bees entering the beehive; {out}: bees leaving the beehive) extracted from the column device.

Changes made to data

  1. In 138 records in which column number contained missing data (114 records for location INRAE, 18 records for location HALLE, 6 records for location GENT) it was replaced by {NULL}.

Unresolved issues

  1. Due to the missing description of the data it is unclear what the time provided with the date in the raw data refers to. The data provider is requested to make this information available.
  2. It is unclear, whether the number of bees entering or leaving the beehive reported in the raw data refer to the date provided in column date or to the previous day, given that the time provided with the date (either 01:00 or 02:00) is often used as transmission time for remote devices. The data provider is requested to make this information available.
  3. Extreme outliers have been found in the number of bees entering and exiting the hive, which should be further investigated by the data provider.