EU Pollinator Hub

, ,

Dataset Report
Unique identifier: VRRMN16.0.0
Title: Varroa monitoring Austria
Long title: Results from a Varroa monitoring program in various apiaries in Austria
Status: Quality Validated
Current Version: v. 1.0
Published: 2023-12-13
Reviewed by:
Citation proposal:
Biene Österreich – Imkereidachverband 2023 Report of dataset Varroa monitoring Austria, v. 1.0 [VRRMN16.0.0]. EU Pollinator Hub. [2025-07-12] app.pollinatorhub.eu
Compliance with FAIR* principles
Findable
Accessible
Interoperable
Reusable
See https://www.go-fair.org/fair-principles for more information about FAIR principles
Data Quality
Under evaluation

Document history

Release

Version v. 1.0 released on 2023-12-13.

Revision

Table 1. List of revisions made to the document. Identifier of revision (No); date of revision (Date); description of revision (Description); reason for revision (Reason).
No Date Description Reason
1 2023-12-13 00:12:00 Initial release. n/a

Abbreviations

No abbreviations.

Executive summary

Data overview:

n/a

Data value:

n/a

Data description:

n/a

interactions.single.section-about.data-overview-application:

n/a

Unresolved issues:

n/a

Introduction

n/a

Material and methods

Data acquisition

n/a
Table 2. List of raw data and metadata files included in the dataset. Identifier of table row (No); name of the file (File); the type of the file (Type); file contains data (D); file contains metadata (M); date of upload of the file to the EU Pollinator Hub (Arrival); number of data points contained within the file (if applicable); uploaded file size.
No File Type D M Arrival Data points File size
1 hive.csv CSV - Comma seperated values Yes No 2023-12-09 09:12:40 4,232 21.50 KiB
2 station.csv CSV - Comma seperated values Yes No 2023-12-09 09:12:07 365 2.78 KiB
3 user.csv CSV - Comma seperated values Yes No 2023-12-09 09:12:43 198 906.00 B
4 varroa_sampling.csv CSV - Comma seperated values Yes No 2023-12-09 09:12:09 77,868 498.51 KiB
5 yard.csv CSV - Comma seperated values Yes No 2023-12-09 10:12:52 968 5.22 KiB

Data preparation

n/a

Data validation

n/a

Data analysis

n/a

Data description

Dataset

Table 3. Summary of tables belonging to the dataset. Table row identifier (No); name of the table (Table); description of the table (Description).
No Table Description
1 hive The table maps the relationship between beekeepers (anonymised users of the web application which is used by beekeepers to provide…
2 station The table contains geographic information of the NOAA weather stations contained in the dataset. There are 73 unique weather stations…
3 user The table contains the number of samples (Varroa infestation data of beehives) that were provided by each single user. The…
4 Varroa sampling This table contains data on Varroa infestation levels (the number of varroa mites found in the sampling event) measured in…
5 weather The combined hourly weather data collected from 73 weather stations around Austria for the 8 years, total ~1.3 million rows.…
6 yard The table contains data on the apiaries (yards) at which the beehives for which the Varroa samples were obtained, were…
Table 4. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
interactions.single.uid VRRMN16.0.0
Title Varroa monitoring Austria
Long title Results from a Varroa monitoring program in various apiaries in Austria
Target IRI https://app.pollinatorhub.eu/dataset-discovery/VRRMN16.0.0
interactions.single.section-details.licence CC BY-SA 4.0
DOI n/a
Created 2022-03-14
Published 2023-12-13
Contact n/a
Keywords Austria, Varroa destructor, monitoring
Data collection years 2012-2020
Regions, the data was collected in Österreich
Abstract

An eight-year survey of Varroa destructor infestation rates of western honey bee (Apis mellifera) colonies across Austria and the spatial dimension, temporal dimension and weather factors that impact these infestation rates.

Table 5. Standardised metadata of the data provider Biene Österreich – Imkereidachverband. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name Biene Österreich – Imkereidachverband
Url
Acronym
IRI https://app.pollinatorhub.eu/data-providers/boe
Contact Georg-Coch Platz 3/11a, 1010 Wien www.biene-oesterreich.at office@biene-oesterreich.at
Description

The Austrian Beekeepers Federation (, Biene Österreich-Imkereidachverband) is the umbrella organisation of the two largest beekeeping associations in Austria, the Austrian Beekepers Association (ÖIB, Österreichischer Imkerbund) and the Austrian Professional Beekeepers Association (ÖEIB, Österreichischer Erwerbsimkerbund).

Tables

hive

Table 6. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier VRRMN16.HIVEA141.0
Name hive
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.HIVEA141.0
Table Type File
Licence CC BY-SA 4.0
Description

The table maps the relationship between beekeepers (anonymised users of the web application which is used by beekeepers to provide Varroa infestation data in their bee yards) and the hives for which they reported the Varroa infestation levels. There are 99 unique user_id’s and 2116 hive id’s.

The table maps the relationship between beekeepers (anonymised users of the web application which is used by beekeepers to provide Varroa infestation data in their bee yards) and the hives for which they reported the Varroa infestation levels. There are 99 unique user_id’s and 2116 hive id’s.

Metadata

n/a
Table 7. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
hive_id

The hive identifier

Integer number Integer [0.0.NTGER313]

n/a

user_id

The user identifier

Integer number Integer [0.0.NTGER313]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 8. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
hive_id 1 - 4 1,187.7 1 562.25 1,187.5 1,788.75 2,501 2,116 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2,116 ( 100.0% )
user_id 2 - 4 8,001.2 10 8,383 8,418 8,509 9,128 2,116 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 103 ( 4.9% )

Quality measures

Table 9. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
hive_id
100.00%
100.00%
1 1
user_id
100.00%
4.87%
8418 8310

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

station

Table 10. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier VRRMN16.STTNA142.0
Name station
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.STTNA142.0
Table Type File
Licence CC BY-SA 4.0
Description

The table contains geographic information of the NOAA weather stations contained in the dataset. There are 73 unique weather stations in this dataset. These are contained between latitudes of 46.617 and 48.683 and longitudes of 9.617 and 16.600. Because of the public availability of this data, these coordinates are not blurred. These stations can be found between elevations of 153 meters and 1210 meters above sea level. 90% of the yard elevations are within 300 meters of the weather station elevation. This means there is between a 0 and 6 degrees celsius difference in air temperature which can be calculated with the data provided for accuracy in analysis.

The table contains geographic information of the NOAA weather stations contained in the dataset. There are 73 unique weather stations in this dataset. These are contained between latitudes of 46.617 and 48.683 and longitudes of 9.617 and 16.600. Because of the public availability of this data, these coordinates are not blurred. These stations can be found between elevations of 153 meters and 1210 meters above sea level. 90% of the yard elevations are within 300 meters of the weather station elevation. This means there is between a 0 and 6 degrees celsius difference in air temperature which can be calculated with the data provided for accuracy in analysis.

Metadata

n/a
Table 11. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
station_id

The NOAA weather station identifier

Integer number Integer [0.0.NTGER313]

n/a

station_title

The NOAA weather Station Name

String Text [0.0.TEXTA315]

n/a

latitude

Latitude coordinates of the station in decimal degrees in WGS84 standard.

Decimal number DecimalNumber [0.0.DCMLN314]

n/a

longitude

Longitude coordinates of the station in decimal degrees in WGS84 standard.

Decimal number DecimalNumber [0.0.DCMLN314]

n/a

station_elevation

Meters above Sea Level

Decimal number DecimalNumber [0.0.DCMLN314]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 12. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
station_id 6 - 6 112,055.1 110,010 110,825 112,200 113,030 113,900 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 73 ( 100.0% )
station_title 4 - 26 n/a ALBERSCHWEND… n/a n/a n/a ZELTWEG/AUTO… 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 73 ( 100.0% )
latitude 2 - 6 47.5721 46.617 47.075 47.45 48.175 48.683 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 56 ( 76.7% )
longitude 2 - 6 14.4541 9.617 13.558 14.744 15.7665 16.6 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 66 ( 90.4% )
station_elevation 3 - 6 536.72 153 306.05 486 715.7 1,209.7 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 73 ( 100.0% )

Quality measures

Table 13. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
station_id
100.00%
100.00%
110010 110010
station_title
100.00%
100.00%
WOLFSEGG WOLFSEGG
latitude
100.00%
76.71%
48.567 48.1
longitude
100.00%
90.41%
16.367 13.667
station_elevation
100.00%
100.00%
615.6 615.6

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

user

Table 14. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier VRRMN16.USERA143.0
Name user
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.USERA143.0
Table Type File
Licence CC BY-SA 4.0
Description

The table contains the number of samples (Varroa infestation data of beehives) that were provided by each single user. The total number of samples collected is 11,124. It is important to note that there is a strong bias in the origin of the samples. A single user-provided 27% of the samples in this dataset. About 53% of the samples are derived from 22 users who each provided 100 to 999 samples. 18% of the samples are from a group of 56 users who provided 10 to 99 samples. 1% of the samples were given by 24 users who had entered less than 10 samples each.

The table contains the number of samples (Varroa infestation data of beehives) that were provided by each single user. The total number of samples collected is 11,124. It is important to note that there is a strong bias in the origin of the samples. A single user-provided 27% of the samples in this dataset. About 53% of the samples are derived from 22 users who each provided 100 to 999 samples. 18% of the samples are from a group of 56 users who provided 10 to 99 samples. 1% of the samples were given by 24 users who had entered less than 10 samples each.

Metadata

n/a
Table 15. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
user_id

The user identifier

Integer number Integer [0.0.NTGER313]

n/a

samples

Total numbers of samples provided by a user

Integer number Integer [0.0.NTGER313]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 16. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
user_id 2 - 4 8,403.9 10 8,354 8,411 8,617 9,128 99 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 99 ( 100.0% )
samples 1 - 4 111.6 1 10 25 96 3,058 99 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 65 ( 65.7% )

Quality measures

Table 17. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
user_id
100.00%
100.00%
10 10
samples
100.00%
65.66%
10 3058

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

Varroa sampling

Table 18. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier VRRMN16.VRRSM144.0
Name Varroa sampling
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.VRRSM144.0
Table Type File
Licence CC BY-SA 4.0
Description

This table contains data on Varroa infestation levels (the number of varroa mites found in the sampling event) measured in individual hives on a given apiary at a given time with a given quality standard. Varroa samples were collected from citizen and mined data using 3 standard method sources between the years 2012-2020. This data contains 11124 varroa sampling events. Roughly 21% of these events record zero mites present. The highest number of mites present in a single sampling event is 5016. Sampling events are collected from 04/02/12 to 11/11/20, and last on average 7.2 days each (range: 1.0-23.8 days). Roughly 75% of Varroa Sampling events occur between 3 and 9 days.

Data on mite infestation levels were collected from 3 sources by a standard method - natural mite falls - from 2012 to 2020, mainly in the spring, early summer, and late summer. Data were collected from 3 different sources of differing quality. Data from the highest quality, described as quality_control=2, was examined with the BeeVS diagnostic system (Apisfero, Turin, Italy), which consists of a high-resolution scanner to take a picture of the samples (sticky boards placed under the brood nest of colonies) and cloud-based software used to count the number of mites on the sticky boards. Data from the intermediate source is described as quality_control=1 and were examined manually by a trained group. Data from the poorest quality source is described as quality_control=0 and were examined manually by untrained individuals according to a classification scheme. Data was entered via a web terminal by whomever analyzed the sample. The software vetted the data for plausibility (rejection of values that exceed 100 mites/day) and completeness (rejection of values that did not fall between a 3 day and 21-day measuring interval). Data exceeding these limits, which can be found in the data set, has been imported from external resources and has been approved by the supervisor. The data collected by untrained individuals were checked by the supervisor for plausibility.

From 2012 to 2016 the project was only implemented in the Austrian province of Styria, where approximately 3500 beekeepers supervised 53000 to 56000 honeybee colonies. In 2017 the crowdsourcing initiative was extended to all nine Austrian provinces, consisting of 28032 to 30237 beekeepers and 329402 to 390607 honeybee colonies in their care.

The total number of samples collected is 11124. 4033 (36%) were medium quality samples (QC=1) and 3267 (29%) were high quality samples (QC=2).

The varroa survey dataset includes 99 users (beekeepers), 242 bee yards (apiaries), and 2,116 hives from the nine Austrian provinces for a total of 11124 records pertaining to varroa infestation.

This table contains data on Varroa infestation levels (the number of varroa mites found in the sampling event) measured in individual hives on a given apiary at a given time with a given quality standard. Varroa samples were collected from citizen and mined data using 3 standard method sources between the years 2012-2020. This data contains 11124 varroa sampling events. Roughly 21% of these events record zero mites present. The highest number of mites present in a single sampling event is 5016. Sampling events are collected from 04/02/12 to 11/11/20, and last on average 7.2 days each (range: 1.0-23.8 days). Roughly 75% of Varroa Sampling events occur between 3 and 9 days.

Data on mite infestation levels were collected from 3 sources by a standard method - natural mite falls - from 2012 to 2020, mainly in the spring, early summer, and late summer. Data were collected from 3 different sources of differing quality. Data from the highest quality, described as quality_control=2, was examined with the BeeVS diagnostic system (Apisfero, Turin, Italy), which consists of a high-resolution scanner to take a picture of the samples (sticky boards placed under the brood nest of colonies) and cloud-based software used to count the number of mites on the sticky boards. Data from the intermediate source is described as quality_control=1 and were examined manually by a trained group. Data from the poorest quality source is described as quality_control=0 and were examined manually by untrained individuals according to a classification scheme. Data was entered via a web terminal by whomever analyzed the sample. The software vetted the data for plausibility (rejection of values that exceed 100 mites/day) and completeness (rejection of values that did not fall between a 3 day and 21-day measuring interval). Data exceeding these limits, which can be found in the data set, has been imported from external resources and has been approved by the supervisor. The data collected by untrained individuals were checked by the supervisor for plausibility.

From 2012 to 2016 the project was only implemented in the Austrian province of Styria, where approximately 3500 beekeepers supervised 53000 to 56000 honeybee colonies. In 2017 the crowdsourcing initiative was extended to all nine Austrian provinces, consisting of 28032 to 30237 beekeepers and 329402 to 390607 honeybee colonies in their care.

The total number of samples collected is 11124. 4033 (36%) were medium quality samples (QC=1) and 3267 (29%) were high quality samples (QC=2).

The varroa survey dataset includes 99 users (beekeepers), 242 bee yards (apiaries), and 2,116 hives from the nine Austrian provinces for a total of 11124 records pertaining to varroa infestation.

Metadata

n/a
Table 19. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
sampling_id

The sampling event identifier

String dwc:materialSampleID [0.0.MTRLS489]

n/a

date_from

The first date (year, month, day) and time (hours, minutes) of the sampling event

String Text [0.0.TEXTA315]

n/a

date_to

The final date (year, month, day) and time (hours, minutes) of the sampling event

String Text [0.0.TEXTA315]

n/a

varroa_count

The number of varroa mites found in the sampling event

Decimal number pms:naturalVarroaMiteFall [0.0.NMBRF371]

mites d-1

quality_control

The quality level of the sample collected

  • 2 = examined with the BeeVS diagnostic system
  • 1 = examined manually by a trained group.
  • 0 = examined manually by untrained individuals
Integer number Integer [0.0.NTGER313]

n/a

hive_id

The hive identifier

String pms:beehiveID [0.0.HVEID216]

n/a

yard_id

The yard identifier

String pms:apiaryID [0.0.PRYID342]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 20. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
sampling_id 1 - 5 5,788.6 1 2,953.25 5,828.5 8,625.75 11,427 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 11,124 ( 100.0% )
date_from 11 - 14 n/a 1/1/19 13:00 n/a n/a n/a 9/9/18 15:30 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1,903 ( 17.1% )
date_to 11 - 14 n/a 1/12/18 16:0… n/a n/a n/a 9/9/20 18:00 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2,227 ( 20.0% )
varroa_count 1 - 4 28.3 0 1 4 16 5,016 11,124 0 ( 0.0% ) 2,301 ( 20.7% ) 0 ( 0.0% ) 387 ( 3.5% )
quality_control 1 - 1 0.9 0 0 1 2 2 11,124 0 ( 0.0% ) 3,824 ( 34.4% ) 0 ( 0.0% ) 3 ( 0.0% )
hive_id 1 - 4 820.6 1 289 715 1,199 2,501 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2,116 ( 19.0% )
yard_id 2 - 3 370.5 73 194 391 522 664 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 242 ( 2.2% )

Quality measures

Table 21. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
sampling_id
100.00%
100.00%
1 1
date_from
100.00%
17.11%
4/4/20 8:00 11/17/17 15:20
date_to
100.00%
20.02%
4/4/20 11:00 4/20/17 17:20
varroa_count
100.00%
3.48%
0 700
quality_control
100.00%
0.03%
1 2
hive_id
100.00%
19.02%
943 431
yard_id
100.00%
2.18%
87 404

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

weather

Table 22. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier VRRMN16.WTHER145.0
Name weather
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.WTHER145.0
Table Type File
Licence CC BY-SA 4.0
Description

The combined hourly weather data collected from 73 weather stations around Austria for the 8 years, total ~1.3 million rows.

Weather data is derived from NOAA. using Integrated Surface Data Lite (ISD-Lite). The ISD-Lite data contains a formatted subset of the complete Integrated Surface Data (ISD) for a number of elements. The data are based on data exchanged under the World Meteorological Organization (WMO) World Weather Watch Program according to WMO Resolution 40 (Cg-XII). The data of the Austria weather stations have been filtered from: ftp://ftp.ncei.noaa.gov/pub/data/noaa/ by unique USAF, WBAN, and year. The hourly values of temperature, dew point, wind speed, pressure, and precipitation have been maintained in the data set and preserved in original metric measurements. Each bee yard has been matched to the closest weather station. The dataset includes 73 weather stations, 2012-2020 hourly values, and 1.3 million records.

The combined hourly weather data collected from 73 weather stations around Austria for the 8 years, total ~1.3 million rows.

Weather data is derived from NOAA. using Integrated Surface Data Lite (ISD-Lite). The ISD-Lite data contains a formatted subset of the complete Integrated Surface Data (ISD) for a number of elements. The data are based on data exchanged under the World Meteorological Organization (WMO) World Weather Watch Program according to WMO Resolution 40 (Cg-XII). The data of the Austria weather stations have been filtered from: ftp://ftp.ncei.noaa.gov/pub/data/noaa/ by unique USAF, WBAN, and year. The hourly values of temperature, dew point, wind speed, pressure, and precipitation have been maintained in the data set and preserved in original metric measurements. Each bee yard has been matched to the closest weather station. The dataset includes 73 weather stations, 2012-2020 hourly values, and 1.3 million records.

Metadata

n/a
Table 23. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 24. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct

Quality measures

Table 25. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

yard

Table 26. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier VRRMN16.YARDA146.0
Name yard
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.YARDA146.0
Table Type File
Licence CC BY-SA 4.0
Description

The table contains data on the apiaries (yards) at which the beehives for which the Varroa samples were obtained, were kept at the time of sampling. There are 242 unique yard_id’s. Yards connect to the weather files by the closest weather station. Each bee yard has been matched to the closest weather station.

The table contains data on the apiaries (yards) at which the beehives for which the Varroa samples were obtained, were kept at the time of sampling. There are 242 unique yard_id’s. Yards connect to the weather files by the closest weather station. Each bee yard has been matched to the closest weather station.

Metadata

n/a
Table 27. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
yard_id

The yard identifier

Integer number Integer [0.0.NTGER313]

n/a

elevation

Meters above Sea Level rounded to the nearest meter

Integer number Integer [0.0.NTGER313]

n/a

nuts

NUTS is a geocode standard for referencing the administrative divisions of countries for statistical purposes.

  • AT1 - East Austria; Burgenland (AT11), Lower Austria (AT12), Vienna (AT13)
  • AT2- South Austria; Carinthia (AT21), Styria (AT22)
  • AT3 West Austria; Upper Austria(AT31), Salzburg(AT32), Tyrol(AT30), Vorarlberg (AT34)

The current Nomenclature of Territorial Units for Statistics (NUTS) adopted by the European Union (Commission Delegated Regulation 2019/1755) is applied.

String Text [0.0.TEXTA315]

n/a

station_id

The NOAA weather station identifier

Integer number Integer [0.0.NTGER313]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 28. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
yard_id 2 - 3 446.0 73 370.75 453.5 586.25 664 242 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 242 ( 100.0% )
elevation 3 - 4 510.0 150 324 450 637.75 1,413 242 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 164 ( 67.8% )
nuts 5 - 5 n/a AT111 n/a n/a n/a AT342 242 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 32 ( 13.2% )
station_id 6 - 6 111,866.0 110,010 110,600 111,750 112,960 113,900 242 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 73 ( 30.2% )

Quality measures

Table 29. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
yard_id
100.00%
100.00%
73 73
elevation
100.00%
67.77%
450 194
nuts
100.00%
13.22%
AT221 AT314
station_id
100.00%
30.17%
111750 112440

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

References

  1. Rubinigg M., MacDonald M., Davenport V., Hassler E., Hassan A., Shala-Mayrhofer V. et al. 2023 Predicting Varroa: Longitudinal Data, Micro Climate, and Proximity Closeness Useful for Predicting Varroa Infestations (I1.A1). Data & Analytics for Good. [2023-11-4] data-for-good.pubpub.org

Annex 1: Table column reports

Table: hive

Column: hive_id

Table 30. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name hive_id
Description

The hive identifier

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 31. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
hive_id 1 - 4 1,187.7 1 562.25 1,187.5 1,788.75 2,501 2,116 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2,116 ( 100.0% )
Table 32. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
hive_id
100.00%
100.00%
1 1

Continuous Data Distribution

Figure 1. Distribution of values in the column.

Outliers

Figure 2. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 3. Visualization of completeness of the data in the column.

Uniqueness

Figure 4. Visualization of uniqueness of the data in the column.

Column: user_id

Table 33. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name user_id
Description

The user identifier

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 34. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
user_id 2 - 4 8,001.2 10 8,383 8,418 8,509 9,128 2,116 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 103 ( 4.9% )
Table 35. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
user_id
100.00%
4.87%
8418 8310

Continuous Data Distribution

Figure 5. Distribution of values in the column.

Outliers

Figure 6. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 7. Visualization of completeness of the data in the column.

Uniqueness

Figure 8. Visualization of uniqueness of the data in the column.

Table: station

Column: station_id

Table 36. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name station_id
Description

The NOAA weather station identifier

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 37. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
station_id 6 - 6 112,055.1 110,010 110,825 112,200 113,030 113,900 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 73 ( 100.0% )
Table 38. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
station_id
100.00%
100.00%
110010 110010

Continuous Data Distribution

Figure 9. Distribution of values in the column.

Outliers

Figure 10. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 11. Visualization of completeness of the data in the column.

Uniqueness

Figure 12. Visualization of uniqueness of the data in the column.

Column: station_title

Table 39. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name station_title
Description

The NOAA weather Station Name

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 40. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
station_title 4 - 26 n/a ALBERSCHWEND… n/a n/a n/a ZELTWEG/AUTO… 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 73 ( 100.0% )
Table 41. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
station_title
100.00%
100.00%
WOLFSEGG WOLFSEGG

Completeness

Figure 13. Visualization of completeness of the data in the column.

Uniqueness

Figure 14. Visualization of uniqueness of the data in the column.

Column: latitude

Table 42. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name latitude
Description

Latitude coordinates of the station in decimal degrees in WGS84 standard.

Data type Decimal number
Descriptor DecimalNumber [UID:0.0.DCMLN314]
Descriptor description

Any of the rational or irrational numbers.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DCMLN314
Unit

n/a

Table 43. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
latitude 2 - 6 47.5721 46.617 47.075 47.45 48.175 48.683 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 56 ( 76.7% )
Table 44. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
latitude
100.00%
76.71%
48.567 48.1

Continuous Data Distribution

Figure 15. Distribution of values in the column.

Outliers

Figure 16. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 17. Visualization of completeness of the data in the column.

Uniqueness

Figure 18. Visualization of uniqueness of the data in the column.

Column: longitude

Table 45. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name longitude
Description

Longitude coordinates of the station in decimal degrees in WGS84 standard.

Data type Decimal number
Descriptor DecimalNumber [UID:0.0.DCMLN314]
Descriptor description

Any of the rational or irrational numbers.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DCMLN314
Unit

n/a

Table 46. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
longitude 2 - 6 14.4541 9.617 13.558 14.744 15.7665 16.6 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 66 ( 90.4% )
Table 47. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
longitude
100.00%
90.41%
16.367 13.667

Continuous Data Distribution

Figure 19. Distribution of values in the column.

Outliers

Figure 20. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 21. Visualization of completeness of the data in the column.

Uniqueness

Figure 22. Visualization of uniqueness of the data in the column.

Column: station_elevation

Table 48. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name station_elevation
Description

Meters above Sea Level

Data type Decimal number
Descriptor DecimalNumber [UID:0.0.DCMLN314]
Descriptor description

Any of the rational or irrational numbers.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DCMLN314
Unit

n/a

Table 49. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
station_elevation 3 - 6 536.72 153 306.05 486 715.7 1,209.7 73 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 73 ( 100.0% )
Table 50. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
station_elevation
100.00%
100.00%
615.6 615.6

Continuous Data Distribution

Figure 23. Distribution of values in the column.

Outliers

Figure 24. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 25. Visualization of completeness of the data in the column.

Uniqueness

Figure 26. Visualization of uniqueness of the data in the column.

Table: user

Column: user_id

Table 51. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name user_id
Description

The user identifier

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 52. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
user_id 2 - 4 8,403.9 10 8,354 8,411 8,617 9,128 99 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 99 ( 100.0% )
Table 53. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
user_id
100.00%
100.00%
10 10

Continuous Data Distribution

Figure 27. Distribution of values in the column.

Outliers

Figure 28. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 29. Visualization of completeness of the data in the column.

Uniqueness

Figure 30. Visualization of uniqueness of the data in the column.

Column: samples

Table 54. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name samples
Description

Total numbers of samples provided by a user

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 55. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
samples 1 - 4 111.6 1 10 25 96 3,058 99 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 65 ( 65.7% )
Table 56. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
samples
100.00%
65.66%
10 3058

Continuous Data Distribution

Figure 31. Distribution of values in the column.

Outliers

Figure 32. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 33. Visualization of completeness of the data in the column.

Uniqueness

Figure 34. Visualization of uniqueness of the data in the column.

Table: Varroa sampling

Column: sampling_id

Table 57. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name sampling_id
Description

The sampling event identifier

Data type String
Descriptor dwc:materialSampleID [UID:0.0.MTRLS489]
Descriptor description

A term from the Darwin Core standard:

An identifier for the dwc:MaterialSample (as opposed to a particular digital record of the dwc:MaterialSample). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the dwc:materialSampleID globally unique.

Descriptor target IRI http://rs.tdwg.org/dwc/terms/materialSampleID
Unit

n/a

Table 58. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
sampling_id 1 - 5 5,788.6 1 2,953.25 5,828.5 8,625.75 11,427 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 11,124 ( 100.0% )
Table 59. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
sampling_id
100.00%
100.00%
1 1

Continuous Data Distribution

Figure 35. Distribution of values in the column.

Outliers

Figure 36. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 37. Visualization of completeness of the data in the column.

Uniqueness

Figure 38. Visualization of uniqueness of the data in the column.

Column: date_from

Table 60. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name date_from
Description

The first date (year, month, day) and time (hours, minutes) of the sampling event

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 61. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
date_from 11 - 14 n/a 1/1/19 13:00 n/a n/a n/a 9/9/18 15:30 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 1,903 ( 17.1% )
Table 62. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
date_from
100.00%
17.11%
4/4/20 8:00 11/17/17 15:20

Completeness

Figure 39. Visualization of completeness of the data in the column.

Uniqueness

Figure 40. Visualization of uniqueness of the data in the column.

Column: date_to

Table 63. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name date_to
Description

The final date (year, month, day) and time (hours, minutes) of the sampling event

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 64. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
date_to 11 - 14 n/a 1/12/18 16:0… n/a n/a n/a 9/9/20 18:00 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2,227 ( 20.0% )
Table 65. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
date_to
100.00%
20.02%
4/4/20 11:00 4/20/17 17:20

Completeness

Figure 41. Visualization of completeness of the data in the column.

Uniqueness

Figure 42. Visualization of uniqueness of the data in the column.

Column: varroa_count

Table 66. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name varroa_count
Description

The number of varroa mites found in the sampling event

Data type Decimal number
Descriptor pms:naturalVarroaMiteFall [UID:0.0.NMBRF371]
Descriptor description

The quantity infestation rate of adult honey bee colonies with Varroa mites (Varroa destructor), measured as natural mite fall on a sticky board placed under the brood nest of a honey bee colony, expressed in number of Varroa mites per day.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NMBRF371
Unit

mites d-1

Table 67. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
varroa_count 1 - 4 28.3 0 1 4 16 5,016 11,124 0 ( 0.0% ) 2,301 ( 20.7% ) 0 ( 0.0% ) 387 ( 3.5% )
Table 68. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
varroa_count
100.00%
3.48%
0 700

Continuous Data Distribution

Figure 43. Distribution of values in the column.

Outliers

Figure 44. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 45. Visualization of completeness of the data in the column.

Uniqueness

Figure 46. Visualization of uniqueness of the data in the column.

Column: quality_control

Table 69. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name quality_control
Description

The quality level of the sample collected

  • 2 = examined with the BeeVS diagnostic system
  • 1 = examined manually by a trained group.
  • 0 = examined manually by untrained individuals
Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 70. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
quality_control 1 - 1 0.9 0 0 1 2 2 11,124 0 ( 0.0% ) 3,824 ( 34.4% ) 0 ( 0.0% ) 3 ( 0.0% )
Table 71. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
quality_control
100.00%
0.03%
1 2

Data Distribution Top 20

Figure 47. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 48. Distribution of values in the column.

Outliers

Figure 49. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 50. Visualization of completeness of the data in the column.

Uniqueness

Figure 51. Visualization of uniqueness of the data in the column.

Column: hive_id

Table 72. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name hive_id
Description

The hive identifier

Data type String
Descriptor pms:beehiveID [UID:0.0.HVEID216]
Descriptor description

Unique sequence of characters associated with a beehive, which is specific to a dataset, to an apiary or to a beekeeper.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.HVEID216
Unit

n/a

Table 73. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
hive_id 1 - 4 820.6 1 289 715 1,199 2,501 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 2,116 ( 19.0% )
Table 74. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
hive_id
100.00%
19.02%
943 431

Continuous Data Distribution

Figure 52. Distribution of values in the column.

Outliers

Figure 53. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 54. Visualization of completeness of the data in the column.

Uniqueness

Figure 55. Visualization of uniqueness of the data in the column.

Column: yard_id

Table 75. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name yard_id
Description

The yard identifier

Data type String
Descriptor pms:apiaryID [UID:0.0.PRYID342]
Descriptor description

Unique sequence of characters associated with an apiary, which is specific to a dataset or to a beekeeper.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.PRYID342
Unit

n/a

Table 76. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
yard_id 2 - 3 370.5 73 194 391 522 664 11,124 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 242 ( 2.2% )
Table 77. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
yard_id
100.00%
2.18%
87 404

Continuous Data Distribution

Figure 56. Distribution of values in the column.

Outliers

Figure 57. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 58. Visualization of completeness of the data in the column.

Uniqueness

Figure 59. Visualization of uniqueness of the data in the column.

Table: weather

Table: yard

Column: yard_id

Table 78. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name yard_id
Description

The yard identifier

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 79. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
yard_id 2 - 3 446.0 73 370.75 453.5 586.25 664 242 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 242 ( 100.0% )
Table 80. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
yard_id
100.00%
100.00%
73 73

Continuous Data Distribution

Figure 60. Distribution of values in the column.

Outliers

Figure 61. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 62. Visualization of completeness of the data in the column.

Uniqueness

Figure 63. Visualization of uniqueness of the data in the column.

Column: elevation

Table 81. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name elevation
Description

Meters above Sea Level rounded to the nearest meter

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 82. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
elevation 3 - 4 510.0 150 324 450 637.75 1,413 242 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 164 ( 67.8% )
Table 83. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
elevation
100.00%
67.77%
450 194

Continuous Data Distribution

Figure 64. Distribution of values in the column.

Outliers

Figure 65. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 66. Visualization of completeness of the data in the column.

Uniqueness

Figure 67. Visualization of uniqueness of the data in the column.

Column: nuts

Table 84. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name nuts
Description

NUTS is a geocode standard for referencing the administrative divisions of countries for statistical purposes.

  • AT1 - East Austria; Burgenland (AT11), Lower Austria (AT12), Vienna (AT13)
  • AT2- South Austria; Carinthia (AT21), Styria (AT22)
  • AT3 West Austria; Upper Austria(AT31), Salzburg(AT32), Tyrol(AT30), Vorarlberg (AT34)

The current Nomenclature of Territorial Units for Statistics (NUTS) adopted by the European Union (Commission Delegated Regulation 2019/1755) is applied.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 85. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
nuts 5 - 5 n/a AT111 n/a n/a n/a AT342 242 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 32 ( 13.2% )
Table 86. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
nuts
100.00%
13.22%
AT221 AT314

Completeness

Figure 68. Visualization of completeness of the data in the column.

Uniqueness

Figure 69. Visualization of uniqueness of the data in the column.

Column: station_id

Table 87. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name station_id
Description

The NOAA weather station identifier

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 88. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
station_id 6 - 6 111,866.0 110,010 110,600 111,750 112,960 113,900 242 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 73 ( 30.2% )
Table 89. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
station_id
100.00%
30.17%
111750 112440

Continuous Data Distribution

Figure 70. Distribution of values in the column.

Outliers

Figure 71. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 72. Visualization of completeness of the data in the column.

Uniqueness

Figure 73. Visualization of uniqueness of the data in the column.