, ,
• •

Dataset Report

Unique identifier:	VRRMN16.0.0
Title:	Varroa monitoring Austria
Long title:	Results from a Varroa monitoring program in various apiaries in Austria
Status:	Quality Validated
Current Version:	v. 1.0
Published:	2023-12-13
Reviewed by:
Citation proposal:	Biene Österreich – Imkereidachverband 2023 Report of dataset Varroa monitoring Austria, v. 1.0 [VRRMN16.0.0]. EU Pollinator Hub. [2026-05-16] app.pollinatorhub.eu

Compliance with FAIR* principles
Findable	Accessible	Interoperable	Reusable
See https://www.go-fair.org/fair-principles for more information about FAIR principles

Data Quality

Under evaluation

Table of content

Document history
1. Release
2. Revision
Abbreviations
Executive summary
Introduction
Material and methods
Data description
1. Dataset
2. Tables
  1. hive
  2. station
  3. user
  4. Varroa sampling
  5. weather
  6. yard
References
Annex 1: Table column reports

Document history

Release

Version v. 1.0 released on 2023-12-13.

Revision

Table 1. List of revisions made to the document. Identifier of revision (No); date of revision (Date); description of revision (Description); reason for revision (Reason).

No	Date	Description	Reason
1	2023-12-13 00:12:00	Initial release.	n/a

Abbreviations

No abbreviations.

Executive summary

Data overview:

n/a

Data value:

n/a

Data description:

n/a

Data application:

n/a

Unresolved issues:

n/a

Introduction

n/a

Material and methods

Data acquisition

n/a

Table 2. List of raw data and metadata files included in the dataset. Identifier of table row (No); name of the file (File); the type of the file (Type); file contains data (D); file contains metadata (M); date of upload of the file to the EU Pollinator Hub (Arrival); number of data points contained within the file (if applicable); uploaded file size.

No	File	Type	D	M	Arrival	Data points	File size
1	hive.csv	CSV - Comma seperated values	Yes	No	2023-12-09 09:12:40	4,232	21.50 KiB
2	station.csv	CSV - Comma seperated values	Yes	No	2023-12-09 09:12:07	365	2.78 KiB
3	user.csv	CSV - Comma seperated values	Yes	No	2023-12-09 09:12:43	198	906.00 B
4	varroa_sampling.csv	CSV - Comma seperated values	Yes	No	2023-12-09 09:12:09	77,868	498.51 KiB
5	weather.csv	CSV - Comma seperated values	Yes	No	2025-08-21 08:08:20	14,583,965	209.89 MiB
6	yard.csv	CSV - Comma seperated values	Yes	No	2023-12-09 10:12:52	968	5.22 KiB

Data preparation

n/a

Data validation

n/a

Data analysis

n/a

Data description

Dataset

Table 3. Summary of tables belonging to the dataset. Table row identifier (No); name of the table (Table); description of the table (Description).

No	Table	Description
1	hive	The table maps the relationship between beekeepers (anonymised users of the web application which is used by beekeepers to provide…
2	station	The table contains geographic information of the NOAA weather stations contained in the dataset. There are 73 unique weather stations…
3	user	The table contains the number of samples (Varroa infestation data of beehives) that were provided by each single user. The…
4	Varroa sampling	This table contains data on Varroa infestation levels (the number of varroa mites found in the sampling event) measured in…
5	weather	The combined hourly weather data collected from 73 weather stations around Austria for the 8 years, total ~1.3 million rows.…
6	yard	The table contains data on the apiaries (yards) at which the beehives for which the Varroa samples were obtained, were…

Table 4. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
interactions.single.uid	VRRMN16.0.0
Title	Varroa monitoring Austria
Long title	Results from a Varroa monitoring program in various apiaries in Austria
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/VRRMN16.0.0
interactions.single.section-details.licence	CC BY-SA 4.0
DOI	n/a
Created	2022-03-14
Published	2023-12-13
Contact	n/a
Keywords	Austria, Varroa destructor, monitoring
Data collection years	2012-2020
Regions, the data was collected in	Österreich
Abstract	An eight-year survey of Varroa destructor infestation rates of western honey bee (Apis mellifera) colonies across Austria and the spatial dimension, temporal dimension and weather factors that impact these infestation rates.

Table 5. Standardised metadata of the data provider Biene Österreich – Imkereidachverband. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Name	Biene Österreich – Imkereidachverband
Url
Acronym	BÖ
IRI	https://app.pollinatorhub.eu/data-providers/boe
Address	Georg-Coch Platz3/11a, 1010 Wien, Austria
Country	Austria
Contact	Georg-Coch Platz 3/11a, 1010 Wien www.biene-oesterreich.at office@biene-oesterreich.at
Description	The Austrian Beekeepers Federation (BÖ, Biene Österreich-Imkereidachverband) is the umbrella organisation of the two largest beekeeping associations in Austria, the Austrian Beekepers Association (ÖIB, Österreichischer Imkerbund) and the Austrian Professional Beekeepers Association (ÖEIB, Österreichischer Erwerbsimkerbund).

Tables

hive

Table 6. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	VRRMN16.HIVEA141.0
Name	hive
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.HIVEA141.0
Table Type	File
Licence	CC BY-SA 4.0
Description	The table maps the relationship between beekeepers (anonymised users of the web application which is used by beekeepers to provide Varroa infestation data in their bee yards) and the hives for which they reported the Varroa infestation levels. There are 99 unique user_id’s and 2116 hive id’s.

The table maps the relationship between beekeepers (anonymised users of the web application which is used by beekeepers to provide Varroa infestation data in their bee yards) and the hives for which they reported the Varroa infestation levels. There are 99 unique user_id’s and 2116 hive id’s.

Metadata

n/a

Table 7. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Column Description	Datatype	Descriptor	Unit
hive_id	The hive identifier	Integer number	pms:beehiveID [0.0.HVEID216]	n/a
user_id	The user identifier	Integer number	pms:userID [0.0.SERID483]	n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 8. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
hive_id	1 - 4	1,187.7	1	562.25	1,187.5	1,788.75	2,501	2,116	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2,116 ( 100.0% )
user_id	2 - 4	8,001.2	10	8,383	8,418	8,509	9,128	2,116	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	103 ( 4.9% )

Quality measures

Table 9. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
hive_id	100.00%	100.00%	1	1
user_id	100.00%	4.87%	8418	8310

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

station

Table 10. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	VRRMN16.STTNA142.0
Name	station
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.STTNA142.0
Table Type	File
Licence	CC BY-SA 4.0
Description	The table contains geographic information of the NOAA weather stations contained in the dataset. There are 73 unique weather stations in this dataset. These are contained between latitudes of 46.617 and 48.683 and longitudes of 9.617 and 16.600. Because of the public availability of this data, these coordinates are not blurred. These stations can be found between elevations of 153 meters and 1210 meters above sea level. 90% of the yard elevations are within 300 meters of the weather station elevation. This means there is between a 0 and 6 degrees celsius difference in air temperature which can be calculated with the data provided for accuracy in analysis.

The table contains geographic information of the NOAA weather stations contained in the dataset. There are 73 unique weather stations in this dataset. These are contained between latitudes of 46.617 and 48.683 and longitudes of 9.617 and 16.600. Because of the public availability of this data, these coordinates are not blurred. These stations can be found between elevations of 153 meters and 1210 meters above sea level. 90% of the yard elevations are within 300 meters of the weather station elevation. This means there is between a 0 and 6 degrees celsius difference in air temperature which can be calculated with the data provided for accuracy in analysis.

Metadata

n/a

Table 11. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Column Description	Datatype	Descriptor	Unit
station_id	The NOAA weather station identifier	Integer number	Integer [0.0.NTGER313]	n/a
station_title	The NOAA weather Station Name	String	Text [0.0.TEXTA315]	n/a
latitude	Latitude coordinates of the station in decimal degrees in WGS84 standard.	Decimal number	dwc:decimalLatitude [0.0.LTTDE333]	°
longitude	Longitude coordinates of the station in decimal degrees in WGS84 standard.	Decimal number	dwc:decimalLongitude [0.0.LNGTD332]	°
station_elevation	Meters above Sea Level	Decimal number	pms:heightAboveMeanSeaLevel [0.0.HGHTB393]	m

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 12. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
station_id	6 - 6	112,055.1	110,010	110,825	112,200	113,030	113,900	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	73 ( 100.0% )
station_title	4 - 26	n/a	LINZ	n/a	n/a	n/a	KRUMBACH ID…	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	73 ( 100.0% )
latitude	2 - 6	47.5721	46.617	47.075	47.45	48.175	48.683	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	56 ( 76.7% )
longitude	2 - 6	14.4541	9.617	13.558	14.744	15.7665	16.6	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	66 ( 90.4% )
station_elevation	3 - 6	536.72	153	306.05	486	715.7	1,209.7	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	73 ( 100.0% )

Quality measures

Table 13. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
station_id	100.00%	100.00%	110010	110010
station_title	100.00%	100.00%	WOLFSEGG	WOLFSEGG
latitude	100.00%	76.71%	48.567	48.1
longitude	100.00%	90.41%	16.367	13.667
station_elevation	100.00%	100.00%	615.6	615.6

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

user

Table 14. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	VRRMN16.USERA143.0
Name	user
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.USERA143.0
Table Type	File
Licence	CC BY-SA 4.0
Description	The table contains the number of samples (Varroa infestation data of beehives) that were provided by each single user. The total number of samples collected is 11,124. It is important to note that there is a strong bias in the origin of the samples. A single user-provided 27% of the samples in this dataset. About 53% of the samples are derived from 22 users who each provided 100 to 999 samples. 18% of the samples are from a group of 56 users who provided 10 to 99 samples. 1% of the samples were given by 24 users who had entered less than 10 samples each.

The table contains the number of samples (Varroa infestation data of beehives) that were provided by each single user. The total number of samples collected is 11,124. It is important to note that there is a strong bias in the origin of the samples. A single user-provided 27% of the samples in this dataset. About 53% of the samples are derived from 22 users who each provided 100 to 999 samples. 18% of the samples are from a group of 56 users who provided 10 to 99 samples. 1% of the samples were given by 24 users who had entered less than 10 samples each.

Metadata

n/a

Table 15. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Column Description	Datatype	Descriptor	Unit
user_id	The user identifier	Integer number	pms:userID [0.0.SERID483]	n/a
samples	Total numbers of samples provided by a user	Integer number	Integer [0.0.NTGER313]	no.

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 16. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
user_id	2 - 4	8,403.9	10	8,354	8,411	8,617	9,128	99	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	99 ( 100.0% )
samples	1 - 4	111.6	1	10	25	96	3,058	99	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	65 ( 65.7% )

Quality measures

Table 17. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
user_id	100.00%	100.00%	10	10
samples	100.00%	65.66%	10	3058

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

Varroa sampling

Table 18. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	VRRMN16.VRRSM144.0
Name	Varroa sampling
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.VRRSM144.0
Table Type	File
Licence	CC BY-SA 4.0
Description	This table contains data on Varroa infestation levels (the number of varroa mites found in the sampling event) measured in individual hives on a given apiary at a given time with a given quality standard. Varroa samples were collected from citizen and mined data using 3 standard method sources between the years 2012-2020. This data contains 11124 varroa sampling events. Roughly 21% of these events record zero mites present. The highest number of mites present in a single sampling event is 5016. Sampling events are collected from 04/02/12 to 11/11/20, and last on average 7.2 days each (range: 1.0-23.8 days). Roughly 75% of Varroa Sampling events occur between 3 and 9 days. Data on mite infestation levels were collected from 3 sources by a standard method - natural mite falls - from 2012 to 2020, mainly in the spring, early summer, and late summer. Data were collected from 3 different sources of differing quality. Data from the highest quality, described as quality_control=2, was examined with the BeeVS diagnostic system (Apisfero, Turin, Italy), which consists of a high-resolution scanner to take a picture of the samples (sticky boards placed under the brood nest of colonies) and cloud-based software used to count the number of mites on the sticky boards. Data from the intermediate source is described as quality_control=1 and were examined manually by a trained group. Data from the poorest quality source is described as quality_control=0 and were examined manually by untrained individuals according to a classification scheme. Data was entered via a web terminal by whomever analyzed the sample. The software vetted the data for plausibility (rejection of values that exceed 100 mites/day) and completeness (rejection of values that did not fall between a 3 day and 21-day measuring interval). Data exceeding these limits, which can be found in the data set, has been imported from external resources and has been approved by the supervisor. The data collected by untrained individuals were checked by the supervisor for plausibility. From 2012 to 2016 the project was only implemented in the Austrian province of Styria, where approximately 3500 beekeepers supervised 53000 to 56000 honeybee colonies. In 2017 the crowdsourcing initiative was extended to all nine Austrian provinces, consisting of 28032 to 30237 beekeepers and 329402 to 390607 honeybee colonies in their care. The total number of samples collected is 11124. 4033 (36%) were medium quality samples (QC=1) and 3267 (29%) were high quality samples (QC=2). The varroa survey dataset includes 99 users (beekeepers), 242 bee yards (apiaries), and 2,116 hives from the nine Austrian provinces for a total of 11124 records pertaining to varroa infestation.

This table contains data on Varroa infestation levels (the number of varroa mites found in the sampling event) measured in individual hives on a given apiary at a given time with a given quality standard. Varroa samples were collected from citizen and mined data using 3 standard method sources between the years 2012-2020. This data contains 11124 varroa sampling events. Roughly 21% of these events record zero mites present. The highest number of mites present in a single sampling event is 5016. Sampling events are collected from 04/02/12 to 11/11/20, and last on average 7.2 days each (range: 1.0-23.8 days). Roughly 75% of Varroa Sampling events occur between 3 and 9 days.

Data on mite infestation levels were collected from 3 sources by a standard method - natural mite falls - from 2012 to 2020, mainly in the spring, early summer, and late summer. Data were collected from 3 different sources of differing quality. Data from the highest quality, described as quality_control=2, was examined with the BeeVS diagnostic system (Apisfero, Turin, Italy), which consists of a high-resolution scanner to take a picture of the samples (sticky boards placed under the brood nest of colonies) and cloud-based software used to count the number of mites on the sticky boards. Data from the intermediate source is described as quality_control=1 and were examined manually by a trained group. Data from the poorest quality source is described as quality_control=0 and were examined manually by untrained individuals according to a classification scheme. Data was entered via a web terminal by whomever analyzed the sample. The software vetted the data for plausibility (rejection of values that exceed 100 mites/day) and completeness (rejection of values that did not fall between a 3 day and 21-day measuring interval). Data exceeding these limits, which can be found in the data set, has been imported from external resources and has been approved by the supervisor. The data collected by untrained individuals were checked by the supervisor for plausibility.

From 2012 to 2016 the project was only implemented in the Austrian province of Styria, where approximately 3500 beekeepers supervised 53000 to 56000 honeybee colonies. In 2017 the crowdsourcing initiative was extended to all nine Austrian provinces, consisting of 28032 to 30237 beekeepers and 329402 to 390607 honeybee colonies in their care.

The total number of samples collected is 11124. 4033 (36%) were medium quality samples (QC=1) and 3267 (29%) were high quality samples (QC=2).

The varroa survey dataset includes 99 users (beekeepers), 242 bee yards (apiaries), and 2,116 hives from the nine Austrian provinces for a total of 11124 records pertaining to varroa infestation.

Metadata

n/a

Table 19. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Column Description	Datatype	Descriptor	Unit
sampling_id	The sampling event identifier	String	dwc:materialSampleID [0.0.MTRLS489]	n/a
date_from	The first date (year, month, day) and time (hours, minutes) of the sampling event	String	Text [0.0.TEXTA315]	n/a
date_to	The final date (year, month, day) and time (hours, minutes) of the sampling event	String	Text [0.0.TEXTA315]	n/a
varroa_count	The number of varroa mites found in the sampling event	Decimal number	pms:naturalVarroaMiteFall [0.0.NMBRF371]	mites d-1
quality_control	The quality level of the sample collected 2 = examined with the BeeVS diagnostic system 1 = examined manually by a trained group. 0 = examined manually by untrained individuals	Integer number	Integer [0.0.NTGER313]	n/a
hive_id	The hive identifier	String	pms:beehiveID [0.0.HVEID216]	n/a
yard_id	The yard identifier	String	pms:apiaryID [0.0.PRYID342]	n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 20. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
sampling_id	1 - 5	5,788.6	1	2,953.25	5,828.5	8,625.75	11,427	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	11,124 ( 100.0% )
date_from	11 - 14	n/a	7/6/17 9:00	n/a	n/a	n/a	10/20/20 12:…	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1,903 ( 17.1% )
date_to	11 - 14	n/a	8/1/17 7:30	n/a	n/a	n/a	11/10/20 12:…	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2,227 ( 20.0% )
varroa_count	1 - 4	28.3	0	1	4	16	5,016	11,124	0 ( 0.0% )	2,301 ( 20.7% )	0 ( 0.0% )	387 ( 3.5% )
quality_control	1 - 1	0.9	0	0	1	2	2	11,124	0 ( 0.0% )	3,824 ( 34.4% )	0 ( 0.0% )	3 ( 0.0% )
hive_id	1 - 4	820.6	1	289	715	1,199	2,501	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2,116 ( 19.0% )
yard_id	2 - 3	370.5	73	194	391	522	664	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	242 ( 2.2% )

Quality measures

Table 21. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
sampling_id	100.00%	100.00%	1	1
date_from	100.00%	17.11%	4/4/20 8:00	11/17/17 15:20
date_to	100.00%	20.02%	4/4/20 11:00	4/20/17 17:20
varroa_count	100.00%	3.48%	0	700
quality_control	100.00%	0.03%	1	2
hive_id	100.00%	19.02%	943	431
yard_id	100.00%	2.18%	87	404

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

weather

Table 22. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	VRRMN16.WTHER145.0
Name	weather
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.WTHER145.0
Table Type	File
Licence	CC BY-SA 4.0
Description	The combined hourly weather data collected from 73 weather stations around Austria for the 8 years, total ~1.3 million rows. Weather data is derived from NOAA. using Integrated Surface Data Lite (ISD-Lite). The ISD-Lite data contains a formatted subset of the complete Integrated Surface Data (ISD) for a number of elements. The data are based on data exchanged under the World Meteorological Organization (WMO) World Weather Watch Program according to WMO Resolution 40 (Cg-XII). The data of the Austria weather stations have been filtered from: ftp://ftp.ncei.noaa.gov/pub/data/noaa/ by unique USAF, WBAN, and year. The hourly values of temperature, dew point, wind speed, pressure, and precipitation have been maintained in the data set and preserved in original metric measurements. Each bee yard has been matched to the closest weather station. The dataset includes 73 weather stations, 2012-2020 hourly values, and 1.3 million records.

The combined hourly weather data collected from 73 weather stations around Austria for the 8 years, total ~1.3 million rows.

Weather data is derived from NOAA. using Integrated Surface Data Lite (ISD-Lite). The ISD-Lite data contains a formatted subset of the complete Integrated Surface Data (ISD) for a number of elements. The data are based on data exchanged under the World Meteorological Organization (WMO) World Weather Watch Program according to WMO Resolution 40 (Cg-XII). The data of the Austria weather stations have been filtered from: ftp://ftp.ncei.noaa.gov/pub/data/noaa/ by unique USAF, WBAN, and year. The hourly values of temperature, dew point, wind speed, pressure, and precipitation have been maintained in the data set and preserved in original metric measurements. Each bee yard has been matched to the closest weather station. The dataset includes 73 weather stations, 2012-2020 hourly values, and 1.3 million records.

Metadata

n/a

Table 23. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Datatype	Descriptor	Unit
station_id	Integer number	pms:recordID [0.0.RCRDD344]	n/a
date	Date	iso-8601:calendarDate [0.0.DATEA317]	n/a
hour	String	iso-8601:clock hour [0.0.HRFDY386]	n/a
air_temp	Decimal number	DecimalNumber [0.0.DCMLN314]	n/a
dew_point	Decimal number	DecimalNumber [0.0.DCMLN314]	n/a
pressure	Decimal number	pms:atmosphericPressure [0.0.TMSPH396]
wind_dir	Integer number	pms:windDirection [0.0.WNDDR475]	°
wind_spd	Decimal number	pms:windSpeed [0.0.WNDSP474]	m s-1
sky_cond	Integer number	Integer [0.0.NTGER313]	n/a
precip_1hr	Decimal number	DecimalNumber [0.0.DCMLN314]	n/a
precip_6hr	Decimal number	DecimalNumber [0.0.DCMLN314]	n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 24. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
station_id	6 - 6	111,896.2	110,010	110,600	112,130	112,960	113,900	1,325,815	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	72 ( 0.0% )
date	10 - 10	2,018.0	2012-01-01	2,017	2,018	2,019	2020-12-07	1,325,815	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	3,244 ( 0.2% )
hour	14 - 14	n/a	15:00:00+00:…	n/a	n/a	n/a	14:00:00+00:…	1,325,815	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	24 ( 0.0% )
air_temp	1 - 5	10.15	-23	3.4	10	16.6	38.6	1,325,815	3,907 ( 0.3% )	8,733 ( 0.7% )	0 ( 0.0% )	593 ( 0.0% )
dew_point	1 - 5	5.28	-31	0.2	5.4	11.2	27.5	1,325,815	4,974 ( 0.4% )	8,678 ( 0.7% )	0 ( 0.0% )	503 ( 0.0% )
pressure	3 - 6	1,017.71	943.9	1,012.6	1,017.5	1,022.8	1,050.9	1,325,815	228,763 ( 17.3% )	0 ( 0.0% )	0 ( 0.0% )	714 ( 0.1% )
wind_dir	1 - 3	194.9	0	90	220	290	360	1,325,815	53,660 ( 4.0% )	104,854 ( 7.9% )	0 ( 0.0% )	38 ( 0.0% )
wind_spd	1 - 4	2.24	0	1	2	3	28	1,325,815	102,751 ( 7.8% )	61,122 ( 4.6% )	0 ( 0.0% )	60 ( 0.0% )
sky_cond	1 - 1	5.0	0	2	6	8	9	1,325,815	1,175,710 ( 88.7% )	18,374 ( 1.4% )	0 ( 0.0% )	11 ( 0.0% )
precip_1hr	1 - 4	0.23	-1	-1	0.1	0.5	69	1,325,815	1,148,579 ( 86.6% )	34,978 ( 2.6% )	0 ( 0.0% )	65 ( 0.0% )
precip_6hr	1 - 4	0.94	-1	0	0	0.3	99	1,325,815	1,250,100 ( 94.3% )	47,825 ( 3.6% )	0 ( 0.0% )	91 ( 0.0% )

Quality measures

Table 25. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
station_id	100.00%	0.01%	112400	112280
date	100.00%	0.24%	2017-11-19	2015-04-08
hour	100.00%	0.00%	06:00:00+00:00	02:00:00+00:00
air_temp	99.71%	0.04%	14	37.5
dew_point	99.62%	0.04%	0	25.9
pressure	82.75%	0.05%	null	976.8
wind_dir	95.95%	0.00%	360	10
wind_spd	92.25%	0.00%	1	24
sky_cond	11.32%	0.00%	null	9
precip_1hr	13.37%	0.00%	null	69
precip_6hr	5.71%	0.01%	null	49

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

yard

Table 26. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).

Parameter	Content
Unique identifier	VRRMN16.YARDA146.0
Name	yard
Target IRI	https://app.pollinatorhub.eu/dataset-discovery/parts/VRRMN16.YARDA146.0
Table Type	File
Licence	CC BY-SA 4.0
Description	The table contains data on the apiaries (yards) at which the beehives for which the Varroa samples were obtained, were kept at the time of sampling. There are 242 unique yard_id’s. Yards connect to the weather files by the closest weather station. Each bee yard has been matched to the closest weather station.

The table contains data on the apiaries (yards) at which the beehives for which the Varroa samples were obtained, were kept at the time of sampling. There are 242 unique yard_id’s. Yards connect to the weather files by the closest weather station. Each bee yard has been matched to the closest weather station.

Metadata

n/a

Table 27. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Column Description	Datatype	Descriptor	Unit
yard_id	The yard identifier	Integer number	Integer [0.0.NTGER313]	n/a
elevation	Meters above Sea Level rounded to the nearest meter	Integer number	pms:heightAboveMeanSeaLevel [0.0.HGHTB393]	m
nuts	NUTS is a geocode standard for referencing the administrative divisions of countries for statistical purposes. AT1 - East Austria; Burgenland (AT11), Lower Austria (AT12), Vienna (AT13) AT2- South Austria; Carinthia (AT21), Styria (AT22) AT3 West Austria; Upper Austria(AT31), Salzburg(AT32), Tyrol(AT30), Vorarlberg (AT34) The current Nomenclature of Territorial Units for Statistics (NUTS) adopted by the European Union (Commission Delegated Regulation 2019/1755) is applied.	String	eurostat:nuts2021Code [0.0.NTSCD55]	n/a
station_id	The NOAA weather station identifier	Integer number	Integer [0.0.NTGER313]	n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 28. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
yard_id	2 - 3	446.0	73	370.75	453.5	586.25	664	242	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	242 ( 100.0% )
elevation	3 - 4	510.0	150	324	450	637.75	1,413	242	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	164 ( 67.8% )
nuts	5 - 5	n/a	AT125	n/a	n/a	n/a	AT127	242	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	32 ( 13.2% )
station_id	6 - 6	111,866.0	110,010	110,600	111,750	112,960	113,900	242	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	73 ( 30.2% )

Quality measures

Table 29. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
yard_id	100.00%	100.00%	73	73
elevation	100.00%	67.77%	450	194
nuts	100.00%	13.22%	AT221	AT314
station_id	100.00%	30.17%	111750	112440

Changes made to preparatory file

n/a

Changes made to data

n/a

Unresolved issues

n/a

References

Rubinigg M., MacDonald M., Davenport V., Hassler E., Hassan A., Shala-Mayrhofer V. et al. 2023 Predicting Varroa: Longitudinal Data, Micro Climate, and Proximity Closeness Useful for Predicting Varroa Infestations (I1.A1). Data & Analytics for Good. [2023-11-4] data-for-good.pubpub.org

Annex 1: Table column reports

Table: hive

Column: hive_id

Table 30. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	hive_id
Description	The hive identifier
Data type	Integer number
Descriptor	pms:beehiveID [UID:0.0.HVEID216]
Descriptor description	A beehive ID is a unique sequence of characters associated with a beehive, which is specific to a dataset, to an apiary or to a beekeeper.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.HVEID216
Unit	n/a

Table 31. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
hive_id	1 - 4	1,187.7	1	562.25	1,187.5	1,788.75	2,501	2,116	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2,116 ( 100.0% )

Table 32. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
hive_id	100.00%	100.00%	1	1

Continuous Data Distribution

Figure 1. Distribution of values in the column.

Outliers

Figure 2. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 3. Visualization of completeness of the data in the column.

Uniqueness

Figure 4. Visualization of uniqueness of the data in the column.

Column: user_id

Table 33. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	user_id
Description	The user identifier
Data type	Integer number
Descriptor	pms:userID [UID:0.0.SERID483]
Descriptor description	A user is a person who utilizes a computer or network service. A user often has a user account and is identified to the system by a username (or user name).
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.SERID483
Unit	n/a

Table 34. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
user_id	2 - 4	8,001.2	10	8,383	8,418	8,509	9,128	2,116	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	103 ( 4.9% )

Table 35. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
user_id	100.00%	4.87%	8418	8310

Continuous Data Distribution

Figure 5. Distribution of values in the column.

Outliers

Figure 6. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 7. Visualization of completeness of the data in the column.

Uniqueness

Figure 8. Visualization of uniqueness of the data in the column.

Table: station

Column: station_id

Table 36. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	station_id
Description	The NOAA weather station identifier
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	n/a

Table 37. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
station_id	6 - 6	112,055.1	110,010	110,825	112,200	113,030	113,900	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	73 ( 100.0% )

Table 38. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
station_id	100.00%	100.00%	110010	110010

Continuous Data Distribution

Figure 9. Distribution of values in the column.

Outliers

Figure 10. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 11. Visualization of completeness of the data in the column.

Uniqueness

Figure 12. Visualization of uniqueness of the data in the column.

Column: station_title

Table 39. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	station_title
Description	The NOAA weather Station Name
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 40. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
station_title	4 - 26	n/a	LINZ	n/a	n/a	n/a	KRUMBACH ID…	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	73 ( 100.0% )

Table 41. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
station_title	100.00%	100.00%	WOLFSEGG	WOLFSEGG

Completeness

Figure 13. Visualization of completeness of the data in the column.

Uniqueness

Figure 14. Visualization of uniqueness of the data in the column.

Column: latitude

Table 42. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	latitude
Description	Latitude coordinates of the station in decimal degrees in WGS84 standard.
Data type	Decimal number
Descriptor	dwc:decimalLatitude [UID:0.0.LTTDE333]
Descriptor description	The geographic latitude (in decimal degrees, using the spatial reference system given in dwc:geodeticDatum) of the geographic center of a dcterms:Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive.
Descriptor target IRI	http://rs.tdwg.org/dwc/terms/decimalLatitude
Unit	°

Table 43. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
latitude	2 - 6	47.5721	46.617	47.075	47.45	48.175	48.683	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	56 ( 76.7% )

Table 44. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
latitude	100.00%	76.71%	48.567	48.1

Continuous Data Distribution

Figure 15. Distribution of values in the column.

Outliers

Figure 16. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 17. Visualization of completeness of the data in the column.

Uniqueness

Figure 18. Visualization of uniqueness of the data in the column.

Column: longitude

Table 45. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	longitude
Description	Longitude coordinates of the station in decimal degrees in WGS84 standard.
Data type	Decimal number
Descriptor	dwc:decimalLongitude [UID:0.0.LNGTD332]
Descriptor description	The geographic longitude (in decimal degrees, using the spatial reference system given in dwc:geodeticDatum) of the geographic center of a dcterms:Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive.
Descriptor target IRI	http://rs.tdwg.org/dwc/terms/decimalLongitude
Unit	°

Table 46. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
longitude	2 - 6	14.4541	9.617	13.558	14.744	15.7665	16.6	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	66 ( 90.4% )

Table 47. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
longitude	100.00%	90.41%	16.367	13.667

Continuous Data Distribution

Figure 19. Distribution of values in the column.

Outliers

Figure 20. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 21. Visualization of completeness of the data in the column.

Uniqueness

Figure 22. Visualization of uniqueness of the data in the column.

Column: station_elevation

Table 48. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	station_elevation
Description	Meters above Sea Level
Data type	Decimal number
Descriptor	pms:heightAboveMeanSeaLevel [UID:0.0.HGHTB393]
Descriptor description	Height above mean sea level is a measure of the vertical distance (height, elevation or altitude) of a location in reference to a historic mean sea level taken as a vertical datum. In geodesy, it is formalized as orthometric heights.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.HGHTB393
Unit	m

Table 49. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
station_elevation	3 - 6	536.72	153	306.05	486	715.7	1,209.7	73	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	73 ( 100.0% )

Table 50. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
station_elevation	100.00%	100.00%	615.6	615.6

Continuous Data Distribution

Figure 23. Distribution of values in the column.

Outliers

Figure 24. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 25. Visualization of completeness of the data in the column.

Uniqueness

Figure 26. Visualization of uniqueness of the data in the column.

Table: user

Column: user_id

Table 51. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	user_id
Description	The user identifier
Data type	Integer number
Descriptor	pms:userID [UID:0.0.SERID483]
Descriptor description	A user is a person who utilizes a computer or network service. A user often has a user account and is identified to the system by a username (or user name).
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.SERID483
Unit	n/a

Table 52. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
user_id	2 - 4	8,403.9	10	8,354	8,411	8,617	9,128	99	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	99 ( 100.0% )

Table 53. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
user_id	100.00%	100.00%	10	10

Continuous Data Distribution

Figure 27. Distribution of values in the column.

Outliers

Figure 28. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 29. Visualization of completeness of the data in the column.

Uniqueness

Figure 30. Visualization of uniqueness of the data in the column.

Column: samples

Table 54. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	samples
Description	Total numbers of samples provided by a user
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	no.

Table 55. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
samples	1 - 4	111.6	1	10	25	96	3,058	99	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	65 ( 65.7% )

Table 56. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
samples	100.00%	65.66%	10	3058

Continuous Data Distribution

Figure 31. Distribution of values in the column.

Outliers

Figure 32. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 33. Visualization of completeness of the data in the column.

Uniqueness

Figure 34. Visualization of uniqueness of the data in the column.

Table: Varroa sampling

Column: sampling_id

Table 57. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	sampling_id
Description	The sampling event identifier
Data type	String
Descriptor	dwc:materialSampleID [UID:0.0.MTRLS489]
Descriptor description	An identifier for a material sample.
Descriptor target IRI	http://rs.tdwg.org/dwc/terms/materialSampleID
Unit	n/a

Table 58. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
sampling_id	1 - 5	5,788.6	1	2,953.25	5,828.5	8,625.75	11,427	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	11,124 ( 100.0% )

Table 59. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
sampling_id	100.00%	100.00%	1	1

Continuous Data Distribution

Figure 35. Distribution of values in the column.

Outliers

Figure 36. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 37. Visualization of completeness of the data in the column.

Uniqueness

Figure 38. Visualization of uniqueness of the data in the column.

Column: date_from

Table 60. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	date_from
Description	The first date (year, month, day) and time (hours, minutes) of the sampling event
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 61. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
date_from	11 - 14	n/a	7/6/17 9:00	n/a	n/a	n/a	10/20/20 12:…	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	1,903 ( 17.1% )

Table 62. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
date_from	100.00%	17.11%	4/4/20 8:00	11/17/17 15:20

Completeness

Figure 39. Visualization of completeness of the data in the column.

Uniqueness

Figure 40. Visualization of uniqueness of the data in the column.

Column: date_to

Table 63. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	date_to
Description	The final date (year, month, day) and time (hours, minutes) of the sampling event
Data type	String
Descriptor	Text [UID:0.0.TEXTA315]
Descriptor description	In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit	n/a

Table 64. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
date_to	11 - 14	n/a	8/1/17 7:30	n/a	n/a	n/a	11/10/20 12:…	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2,227 ( 20.0% )

Table 65. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
date_to	100.00%	20.02%	4/4/20 11:00	4/20/17 17:20

Completeness

Figure 41. Visualization of completeness of the data in the column.

Uniqueness

Figure 42. Visualization of uniqueness of the data in the column.

Column: varroa_count

Table 66. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	varroa_count
Description	The number of varroa mites found in the sampling event
Data type	Decimal number
Descriptor	pms:naturalVarroaMiteFall [UID:0.0.NMBRF371]
Descriptor description	The quantity infestation rate of adult honey bee colonies with Varroa mites (Varroa destructor), measured as natural mite fall on a sticky board placed under the brood nest of a honey bee colony, expressed in number of Varroa mites per day.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NMBRF371
Unit	mites d-1

Table 67. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
varroa_count	1 - 4	28.3	0	1	4	16	5,016	11,124	0 ( 0.0% )	2,301 ( 20.7% )	0 ( 0.0% )	387 ( 3.5% )

Table 68. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
varroa_count	100.00%	3.48%	0	700

Continuous Data Distribution

Figure 43. Distribution of values in the column.

Outliers

Figure 44. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 45. Visualization of completeness of the data in the column.

Uniqueness

Figure 46. Visualization of uniqueness of the data in the column.

Column: quality_control

Table 69. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	quality_control
Description	The quality level of the sample collected 2 = examined with the BeeVS diagnostic system 1 = examined manually by a trained group. 0 = examined manually by untrained individuals
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	n/a

Table 70. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
quality_control	1 - 1	0.9	0	0	1	2	2	11,124	0 ( 0.0% )	3,824 ( 34.4% )	0 ( 0.0% )	3 ( 0.0% )

Table 71. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
quality_control	100.00%	0.03%	1	2

Data Distribution Top 20

Figure 47. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 48. Distribution of values in the column.

Outliers

Figure 49. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 50. Visualization of completeness of the data in the column.

Uniqueness

Figure 51. Visualization of uniqueness of the data in the column.

Column: hive_id

Table 72. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	hive_id
Description	The hive identifier
Data type	String
Descriptor	pms:beehiveID [UID:0.0.HVEID216]
Descriptor description	A beehive ID is a unique sequence of characters associated with a beehive, which is specific to a dataset, to an apiary or to a beekeeper.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.HVEID216
Unit	n/a

Table 73. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
hive_id	1 - 4	820.6	1	289	715	1,199	2,501	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	2,116 ( 19.0% )

Table 74. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
hive_id	100.00%	19.02%	943	431

Continuous Data Distribution

Figure 52. Distribution of values in the column.

Outliers

Figure 53. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 54. Visualization of completeness of the data in the column.

Uniqueness

Figure 55. Visualization of uniqueness of the data in the column.

Column: yard_id

Table 75. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	yard_id
Description	The yard identifier
Data type	String
Descriptor	pms:apiaryID [UID:0.0.PRYID342]
Descriptor description	An apiary ID is a unique sequence of characters associated with an apiary, which is specific to a dataset or to a beekeeper.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.PRYID342
Unit	n/a

Table 76. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
yard_id	2 - 3	370.5	73	194	391	522	664	11,124	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	242 ( 2.2% )

Table 77. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
yard_id	100.00%	2.18%	87	404

Continuous Data Distribution

Figure 56. Distribution of values in the column.

Outliers

Figure 57. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 58. Visualization of completeness of the data in the column.

Uniqueness

Figure 59. Visualization of uniqueness of the data in the column.

Table: weather

Column: station_id

Table 78. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	station_id
Description
Data type	Integer number
Descriptor	pms:recordID [UID:0.0.RCRDD344]
Descriptor description	Unique sequence of integers associated with a record within a certain table.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.RCRDD344
Unit	n/a

Table 79. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
station_id	6 - 6	111,896.2	110,010	110,600	112,130	112,960	113,900	1,325,815	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	72 ( 0.0% )

Table 80. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
station_id	100.00%	0.01%	112400	112280

Data Distribution Top 20

Figure 60. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 61. Distribution of 20 least common values, from lowest to highest.

Outliers

Figure 62. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 63. Visualization of completeness of the data in the column.

Uniqueness

Figure 64. Visualization of uniqueness of the data in the column.

Column: date

Table 81. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	date
Description
Data type	Date
Descriptor	iso-8601:calendarDate [UID:0.0.DATEA317]
Descriptor description	particular calendar day [...] represented by its calendar year [...], its calendar month [...] and its calendar day of month [...]
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DATEA317
Unit	n/a

Table 82. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
date	10 - 10	2,018.0	2012-01-01	2,017	2,018	2,019	2020-12-07	1,325,815	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	3,244 ( 0.2% )

Table 83. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
date	100.00%	0.24%	2017-11-19	2015-04-08

Outliers

Figure 65. Visualization of median, min, max, and outliers in the column.

No data available.

Completeness

Figure 66. Visualization of completeness of the data in the column.

Uniqueness

Figure 67. Visualization of uniqueness of the data in the column.

Column: hour

Table 84. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	hour
Description
Data type	String
Descriptor	iso-8601:clock hour [UID:0.0.HRFDY386]
Descriptor description	time scale unit [...] whose duration [...] is one hour [...] Clock hour is in common parlance often referred to as hour, however in this document clock hour and hour have different definitions.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.HRFDY386
Unit	n/a

Table 85. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
hour	14 - 14	n/a	15:00:00+00:…	n/a	n/a	n/a	14:00:00+00:…	1,325,815	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	24 ( 0.0% )

Table 86. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
hour	100.00%	0.00%	06:00:00+00:00	02:00:00+00:00

Data Distribution Top 20

Figure 68. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 69. Distribution of 20 least common values, from lowest to highest.

Completeness

Figure 70. Visualization of completeness of the data in the column.

Uniqueness

Figure 71. Visualization of uniqueness of the data in the column.

Column: air_temp

Table 87. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	air_temp
Description
Data type	Decimal number
Descriptor	DecimalNumber [UID:0.0.DCMLN314]
Descriptor description	Any of the rational or irrational numbers.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DCMLN314
Unit	n/a

Table 88. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
air_temp	1 - 5	10.15	-23	3.4	10	16.6	38.6	1,325,815	3,907 ( 0.3% )	8,733 ( 0.7% )	0 ( 0.0% )	593 ( 0.0% )

Table 89. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
air_temp	99.71%	0.04%	14	37.5

Data Distribution Top 20

Figure 72. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 73. Distribution of 20 least common values, from lowest to highest.

Outliers

Figure 74. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 75. Visualization of completeness of the data in the column.

Uniqueness

Figure 76. Visualization of uniqueness of the data in the column.

Column: dew_point

Table 90. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	dew_point
Description
Data type	Decimal number
Descriptor	DecimalNumber [UID:0.0.DCMLN314]
Descriptor description	Any of the rational or irrational numbers.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DCMLN314
Unit	n/a

Table 91. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
dew_point	1 - 5	5.28	-31	0.2	5.4	11.2	27.5	1,325,815	4,974 ( 0.4% )	8,678 ( 0.7% )	0 ( 0.0% )	503 ( 0.0% )

Table 92. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
dew_point	99.62%	0.04%	0	25.9

Data Distribution Top 20

Figure 77. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 78. Distribution of 20 least common values, from lowest to highest.

Outliers

Figure 79. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 80. Visualization of completeness of the data in the column.

Uniqueness

Figure 81. Visualization of uniqueness of the data in the column.

Column: pressure

Table 93. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	pressure
Description
Data type	Decimal number
Descriptor	pms:atmosphericPressure [UID:0.0.TMSPH396]
Descriptor description	Atmospheric pressure, also known as air pressure or barometric pressure (after the barometer), is the pressure within the atmosphere of Earth.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TMSPH396
Unit

Table 94. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
pressure	3 - 6	1,017.71	943.9	1,012.6	1,017.5	1,022.8	1,050.9	1,325,815	228,763 ( 17.3% )	0 ( 0.0% )	0 ( 0.0% )	714 ( 0.1% )

Table 95. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
pressure	82.75%	0.05%	null	976.8

Data Distribution Top 20

Figure 82. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 83. Distribution of 20 least common values, from lowest to highest.

Outliers

Figure 84. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 85. Visualization of completeness of the data in the column.

Uniqueness

Figure 86. Visualization of uniqueness of the data in the column.

Column: wind_dir

Table 96. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	wind_dir
Description
Data type	Integer number
Descriptor	pms:windDirection [UID:0.0.WNDDR475]
Descriptor description	The true direction from which the wind is blowing at a given location (i.e., wind blowing from the north to the south is a north wind). It is normally measured in tens of degrees from 10 degrees clockwise through 360 degrees. North is 360 degrees. A wind direction of 0 degrees is only used when wind is calm.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.WNDDR475
Unit	°

Table 97. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
wind_dir	1 - 3	194.9	0	90	220	290	360	1,325,815	53,660 ( 4.0% )	104,854 ( 7.9% )	0 ( 0.0% )	38 ( 0.0% )

Table 98. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
wind_dir	95.95%	0.00%	360	10

Data Distribution Top 20

Figure 87. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 88. Distribution of 20 least common values, from lowest to highest.

Outliers

Figure 89. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 90. Visualization of completeness of the data in the column.

Uniqueness

Figure 91. Visualization of uniqueness of the data in the column.

Column: wind_spd

Table 99. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	wind_spd
Description
Data type	Decimal number
Descriptor	pms:windSpeed [UID:0.0.WNDSP474]
Descriptor description	The rate at which air is moving horizontally past a given point. It may be a 2-minute average speed (reported as wind speed) or an instantaneous speed (reported as a peak wind speed, wind gust, or squall).
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.WNDSP474
Unit	m s-1

Table 100. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
wind_spd	1 - 4	2.24	0	1	2	3	28	1,325,815	102,751 ( 7.8% )	61,122 ( 4.6% )	0 ( 0.0% )	60 ( 0.0% )

Table 101. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
wind_spd	92.25%	0.00%	1	24

Data Distribution Top 20

Figure 92. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 93. Distribution of 20 least common values, from lowest to highest.

Outliers

Figure 94. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 95. Visualization of completeness of the data in the column.

Uniqueness

Figure 96. Visualization of uniqueness of the data in the column.

Column: sky_cond

Table 102. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	sky_cond
Description
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	n/a

Table 103. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
sky_cond	1 - 1	5.0	0	2	6	8	9	1,325,815	1,175,710 ( 88.7% )	18,374 ( 1.4% )	0 ( 0.0% )	11 ( 0.0% )

Table 104. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
sky_cond	11.32%	0.00%	null	9

Data Distribution Top 20

Figure 97. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 98. Distribution of values in the column.

Outliers

Figure 99. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 100. Visualization of completeness of the data in the column.

Uniqueness

Figure 101. Visualization of uniqueness of the data in the column.

Column: precip_1hr

Table 105. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	precip_1hr
Description
Data type	Decimal number
Descriptor	DecimalNumber [UID:0.0.DCMLN314]
Descriptor description	Any of the rational or irrational numbers.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DCMLN314
Unit	n/a

Table 106. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
precip_1hr	1 - 4	0.23	-1	-1	0.1	0.5	69	1,325,815	1,148,579 ( 86.6% )	34,978 ( 2.6% )	0 ( 0.0% )	65 ( 0.0% )

Table 107. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
precip_1hr	13.37%	0.00%	null	69

Data Distribution Top 20

Figure 102. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 103. Distribution of 20 least common values, from lowest to highest.

Outliers

Figure 104. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 105. Visualization of completeness of the data in the column.

Uniqueness

Figure 106. Visualization of uniqueness of the data in the column.

Column: precip_6hr

Table 108. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	precip_6hr
Description
Data type	Decimal number
Descriptor	DecimalNumber [UID:0.0.DCMLN314]
Descriptor description	Any of the rational or irrational numbers.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.DCMLN314
Unit	n/a

Table 109. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
precip_6hr	1 - 4	0.94	-1	0	0	0.3	99	1,325,815	1,250,100 ( 94.3% )	47,825 ( 3.6% )	0 ( 0.0% )	91 ( 0.0% )

Table 110. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
precip_6hr	5.71%	0.01%	null	49

Data Distribution Top 20

Figure 107. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 108. Distribution of 20 least common values, from lowest to highest.

Completeness

Figure 109. Visualization of completeness of the data in the column.

Uniqueness

Figure 110. Visualization of uniqueness of the data in the column.

Table: yard

Column: yard_id

Table 111. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	yard_id
Description	The yard identifier
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	n/a

Table 112. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
yard_id	2 - 3	446.0	73	370.75	453.5	586.25	664	242	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	242 ( 100.0% )

Table 113. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
yard_id	100.00%	100.00%	73	73

Continuous Data Distribution

Figure 111. Distribution of values in the column.

Outliers

Figure 112. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 113. Visualization of completeness of the data in the column.

Uniqueness

Figure 114. Visualization of uniqueness of the data in the column.

Column: elevation

Table 114. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	elevation
Description	Meters above Sea Level rounded to the nearest meter
Data type	Integer number
Descriptor	pms:heightAboveMeanSeaLevel [UID:0.0.HGHTB393]
Descriptor description	Height above mean sea level is a measure of the vertical distance (height, elevation or altitude) of a location in reference to a historic mean sea level taken as a vertical datum. In geodesy, it is formalized as orthometric heights.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.HGHTB393
Unit	m

Table 115. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
elevation	3 - 4	510.0	150	324	450	637.75	1,413	242	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	164 ( 67.8% )

Table 116. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
elevation	100.00%	67.77%	450	194

Continuous Data Distribution

Figure 115. Distribution of values in the column.

Outliers

Figure 116. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 117. Visualization of completeness of the data in the column.

Uniqueness

Figure 118. Visualization of uniqueness of the data in the column.

Column: nuts

Table 117. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	nuts
Description	NUTS is a geocode standard for referencing the administrative divisions of countries for statistical purposes. AT1 - East Austria; Burgenland (AT11), Lower Austria (AT12), Vienna (AT13) AT2- South Austria; Carinthia (AT21), Styria (AT22) AT3 West Austria; Upper Austria(AT31), Salzburg(AT32), Tyrol(AT30), Vorarlberg (AT34) The current Nomenclature of Territorial Units for Statistics (NUTS) adopted by the European Union (Commission Delegated Regulation 2019/1755) is applied.
Data type	String
Descriptor	eurostat:nuts2021Code [UID:0.0.NTSCD55]
Descriptor description	A NUTS code defined in the NUTS classification 2021, valid from 2021-01-01 to 2023-12-31, containing 92 regions at NUTS level 1, 244 regions at NUTS level 2 and 1165 regions at NUTS level 3 level.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTSCD55
Unit	n/a

Table 118. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
nuts	5 - 5	n/a	AT125	n/a	n/a	n/a	AT127	242	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	32 ( 13.2% )

Table 119. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
nuts	100.00%	13.22%	AT221	AT314

Completeness

Figure 119. Visualization of completeness of the data in the column.

Uniqueness

Figure 120. Visualization of uniqueness of the data in the column.

Column: station_id

Table 120. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Parameter	Content
Column name	station_id
Description	The NOAA weather station identifier
Data type	Integer number
Descriptor	Integer [UID:0.0.NTGER313]
Descriptor description	A number with no fractional part, including the negative and positive numbers as well as zero.
Descriptor target IRI	https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit	n/a

Table 121. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).

Column Name	Range	Mean	Minimum	Q₁	Median	Q₃	Maximum	Total	Missing	Zero	Blank	Distinct
station_id	6 - 6	111,866.0	110,010	110,600	111,750	112,960	113,900	242	0 ( 0.0% )	0 ( 0.0% )	0 ( 0.0% )	73 ( 30.2% )

Table 122. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).

Column Name	Completeness	Uniqueness	Most Common Value	Least Common Value
station_id	100.00%	30.17%	111750	112440

Continuous Data Distribution

Figure 121. Distribution of values in the column.

Outliers

Figure 122. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 123. Visualization of completeness of the data in the column.

Uniqueness

Figure 124. Visualization of uniqueness of the data in the column.