EU Pollinator Hub

, ,

Dataset Report
Unique identifier: PSTCD4.0.0
Title: Postcode
Long title: Postcodes from various countries
Status: Quality Validated
Current Version: v. 1.0
Published: 2023-09-13
Reviewed by:
Citation proposal:
EU Pollinator Hub 2023 Report of dataset Postcode, v. 1.0 [PSTCD4.0.0]. EU Pollinator Hub. [2026-02-24] app.pollinatorhub.eu
Compliance with FAIR* principles
Findable
Accessible
Interoperable
Reusable
See https://www.go-fair.org/fair-principles for more information about FAIR principles
Data Quality
Under evaluation

Document history

Release

Version v. 1.0 released on 2023-09-13.

Revision

Table 1. List of revisions made to the document. Identifier of revision (No); date of revision (Date); description of revision (Description); reason for revision (Reason).
No Date Description Reason
1 2023-09-13 00:09:00 Initial release. n/a

Abbreviations

No abbreviations.

Executive summary

Data overview:

Data was obtained from GeoNames (http://www.geonames.org/). It contains an incomplete collection of postcodes from 97 countries worldwide. It is licensed under a Creative Commons Attribution 4.0 license.

Data value:

Data was collected to be used internally on the EU Pollinator Platform (EUPH).

Data description:

The dataset contains 1 table with a total of 1.534.012 records (171.640.227 bytes) .

Data application:

Data will be exclusively used for backend administration of the EU Pollinator Platform (EUPH), in particular for standardisation of data and for interactions with users.

Unresolved issues:

n/a

Introduction

Data was obtained from GeoNames (http://www.geonames.org/). It contains an incomplete collection of postcodes from 97 countries worldwide. It is licensed under a Creative Commons Attribution 4.0 license. It will be exclusively used for backend administration of the EU Pollinator Platform (EUPH), in particular for standardisation of data and for interactions with users.

Material and methods

Data acquisition

Data was obtained from the GeoNames geographical database (www.geonames.org), a project founded by Marc Wick and maintained by Unxos GmbH, Switzerland. The data is licensed under a Creative Commons Attribution 4.0 License. It can be used if credit is given to GeoNames (at least by a link to www.geonames.org). The data is provided without warranty or any representation of accuracy, timeliness or completeness.

Postcodes were obtained from the file allCountries.zip. Postcodes from Canada (CA), Great Briton (GB) and the Netherlands (NL) were substituted with the postcodes obtained from the files CA_full.csv.zip, GB_full.csv.zip and NL_full.csv.zip, respectively. At the time of data acquisition (2023-09-13) 97 countries were supported. For many countries latitude and longitude are determined with an algorithm that searches the place names in the main geonames database using administrative divisions and numerical vicinity of the postal codes as factors in the disambiguation of place names. For postal codes and place name for which no corresponding toponym in the main geonames database could be found an average latitude and longitude of 'neighbouring' postal codes is calculated. For copyright reasons, for Chile only the first digits, for Ireland only the first letters and for Malta only the first letters of the full postal codes are provided. For Argentina the first 5 positions of the postal code and for Brazil only major postal codes (only the codes ending with -000 and the major code per municipality) are available.

Table 2. List of raw data and metadata files included in the dataset. Identifier of table row (No); name of the file (File); the type of the file (Type); file contains data (D); file contains metadata (M); date of upload of the file to the EU Pollinator Hub (Arrival); number of data points contained within the file (if applicable); uploaded file size.
No File Type D M Arrival Data points File size
1 allcountries.csv CSV - Comma seperated values Yes No 2025-09-30 12:09:03 18,609,576 163.69 MiB

Data preparation

Zipped files containing the raw data were unpacked. Unpacked raw data files contained tab separated values and were converted to files in csv format according to WI-002 (Raw data preparation) of SOP-006 (Dataset preparation) using the script ConvertTsv2Csv.py executed with the IDE PyCharm (Version 2023.1.2 Community Edition, JetBrains s.r.o). Data contained in raw data files allCountries.zip, CA_full.zip and NL_full.csv.zip were imported for profiling into a SQL database (MariaDB foundation, Server-Version 10.4.24) running in a XAMPP environment (BitRock, version 3.3.0). Data types of tables were configured using the information contained in in metadata file readme.txt.

Data validation

n/a

Data analysis

n/a

Data description

Dataset

Table 3. Summary of tables belonging to the dataset. Table row identifier (No); name of the table (Table); description of the table (Description).
No Table Description
1 Worldwide postcodes The table contains 1.534.012 records (171.640.227 bytes) with postcodes from 97 distinct countries (PT, IN, JP, MX, SG, PE, PL,…
Table 4. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
interactions.single.uid PSTCD4.0.0
Title Postcode
Long title Postcodes from various countries
Target IRI https://app.pollinatorhub.eu/dataset-discovery/PSTCD4.0.0
interactions.single.section-details.licence CC BY 4.0
DOI n/a
Created 2023-01-26
Published 2023-09-13
Contact n/a
Keywords n/a
Data collection years 2023
Regions, the data was collected in Algeria, American Samoa, Andorra, Belarus, Bulgaria, Canada, Chile, Colombia, Costa Rica, Croatia, Cyprus, Czechia, Denmark, Ecuador, Estonia, Faroe Islands (the), Finland, France
Abstract

The dataset contains postcodes from 97 countries for internal use on the EUPH.

Table 5. Standardised metadata of the data provider EU Pollinator Hub. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Name EU Pollinator Hub
Url
Acronym EUPH
IRI https://app.pollinatorhub.eu/data-providers/euph
Address
Country Belgium
Contact https://www.linkedin.com/company/beelife-european-beekeeping-coordination/ pollinatorhub.eu
Description

The EU Pollinator Hub (EUPH) is a data hub related to pollinators, which is provided by the European Food Safety Authority (EFSA).

Tables

Worldwide postcodes

Table 6. Standardised metadata of the dataset. Reported parameter (Parameter); content of the parameter (Content).
Parameter Content
Unique identifier PSTCD4.WRLDW72.0
Name Worldwide postcodes
Target IRI https://app.pollinatorhub.eu/dataset-discovery/parts/PSTCD4.WRLDW72.0
Table Type File
Licence CC BY 4.0
Description

The table contains 1.534.012 records (171.640.227 bytes) with postcodes from 97 distinct countries (PT, IN, JP, MX, SG, PE, PL, FR, RU, US, RO, ES, TR, KR, UA, GB, LT, AR, AT, IT, SE, AU, DE, DZ, CZ, HR, LV, BR, EE, BG, NO, LU, CH, SK, NL, ZA, ZA, CO, FI, HU, BY, MY, BE, PK, PH, UY, LK, MD, NZ, CA, BD, MA, EC, AZ, DK, RS, CY, TH, SI, GT, DO, MW, CR, CL, HT, MK, PR, RE, IS, FO, BM, GP, MQ, IM, GF, MT, NC, AX, GL, MC, SM, YT, GU, VI, GG, LI, SJ, AD, JE, FM, MP, WF, MH, PM, PW, AS, VA).

The table contains 1.534.012 records (171.640.227 bytes) with postcodes from 97 distinct countries (PT, IN, JP, MX, SG, PE, PL, FR, RU, US, RO, ES, TR, KR, UA, GB, LT, AR, AT, IT, SE, AU, DE, DZ, CZ, HR, LV, BR, EE, BG, NO, LU, CH, SK, NL, ZA, ZA, CO, FI, HU, BY, MY, BE, PK, PH, UY, LK, MD, NZ, CA, BD, MA, EC, AZ, DK, RS, CY, TH, SI, GT, DO, MW, CR, CL, HT, MK, PR, RE, IS, FO, BM, GP, MQ, IM, GF, MT, NC, AX, GL, MC, SM, YT, GU, VI, GG, LI, SJ, AD, JE, FM, MP, WF, MH, PM, PW, AS, VA).

Metadata

• Column countrycode links to countries.iso3166_1_2020.alpha2code

Table 7. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Column Description Datatype Descriptor Unit
countrycode

A two-letter code that represents the country name, recommended by ISO standard 3166-1:2020.

String iso-639:alpha-2LanguageCode [0.0.LPHLN110]

n/a

postalcode

Postal code.

String eurostat:postcode [0.0.PSTCD378]

n/a

placename

Name of the location.

String Text [0.0.TEXTA315]

n/a

adminname1
  1. order subdivision (state).
String Text [0.0.TEXTA315]

n/a

admincode1
  1. order subdivision (state).
String Text [0.0.TEXTA315]

n/a

adminname2
  1. order subdivision (county/province).
String Text [0.0.TEXTA315]

n/a

admincode2
  1. order subdivision (county/province).
String Text [0.0.TEXTA315]

n/a

adminname3
  1. order subdivision (community).
String Text [0.0.TEXTA315]

n/a

admincode3
  1. order subdivision (community).
String Text [0.0.TEXTA315]

n/a

latitude

estimated latitude (wgs84).

Decimal number dwc:decimalLatitude [0.0.LTTDE333]

°

longitude

estimated longitude (wgs84).

Decimal number dwc:decimalLongitude [0.0.LNGTD332]

°

accuracy

accuracy of lat/lng from 1=estimated, 4=geonameid, 6=centroid of addresses or shape.

Integer number Integer [0.0.NTGER313]

n/a

Metadata of individual tables can be found in Annex 1.

Descriptive measures

Table 8. Content analysis of the table. Column name (Name); range of length of characters (Length); arithmetic mean of values in column (Mean); lowest value in column (Min); first quartile of values in column (Q1); median of values in column (Median); third quartile of values in column (Q3); highest value in column (Max); number of records (Total); number and percentage (between brackets) of all values containing NULL (Missing), the value 0 (Zero), exclusively blank characters (Blank), and of distinct values including NULL, Zero and blank (Distinct).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
countrycode 2 - 2 n/a AD n/a n/a n/a ZA 1,550,798 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 97 ( 0.0% )
postalcode 2 - 15 n/a M9 n/a n/a n/a 78177 CITYSS… 1,550,798 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 721,060 ( 46.5% )
placename 0 - 161 n/a n/a n/a n/a Hamilton (So… 1,550,798 1 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 792,066 ( 51.1% )
adminname1 0 - 48 n/a n/a n/a n/a Región del L… 1,550,798 131,031 ( 8.4% ) 0 ( 0.0% ) 0 ( 0.0% ) 1,560 ( 0.1% )
admincode1 0 - 9 n/a n/a n/a n/a L93000001 1,550,798 136,574 ( 8.8% ) 0 ( 0.0% ) 0 ( 0.0% ) 465 ( 0.0% )
adminname2 0 - 49 n/a n/a n/a n/a Dolores Hida… 1,550,798 256,764 ( 16.6% ) 0 ( 0.0% ) 0 ( 0.0% ) 15,043 ( 1.0% )
admincode2 0 - 9 n/a n/a n/a n/a S12000017 1,550,798 331,793 ( 21.4% ) 0 ( 0.0% ) 0 ( 0.0% ) 12,201 ( 0.8% )
adminname3 0 - 51 n/a n/a n/a n/a San Leonardo… 1,550,798 744,704 ( 48.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 40,539 ( 2.6% )
admincode3 0 - 9 n/a n/a n/a n/a W06000011 1,550,798 1,121,361 ( 72.3% ) 0 ( 0.0% ) 0 ( 0.0% ) 24,021 ( 1.5% )
latitude 8 - 10 30.0224925 -89.997600 19.6338 37.19295 45.0161 90.000000 1,550,798 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 372,372 ( 24.0% )
longitude 8 - 11 21.6569758 -179.260000 -8.8368 16.6563 81.315625 179.310000 1,550,798 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 509,297 ( 32.8% )
accuracy 1 - 1 3.7 1 3 4 4 6 1,550,798 274,588 ( 17.7% ) 0 ( 0.0% ) 0 ( 0.0% ) 7 ( 0.0% )

Quality measures

Table 9. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
countrycode
100.00%
0.01%
PT AS
postalcode
100.00%
46.50%
21825 AD100
placename
100.00%
51.07%
Lisboa Encamp
adminname1
91.55%
0.10%
null Canillo
admincode1
91.19%
0.03%
null L93000001
adminname2
83.44%
0.97%
null Rust Stadt
admincode2
78.61%
0.79%
null 941
adminname3
51.98%
2.61%
null Rust
admincode3
27.69%
1.55%
null 10320
latitude
100.00%
24.01%
38.716700 -24.733300
longitude
100.00%
32.84%
-9.133300 -63.772200
accuracy
82.29%
0.00%
4 2

Changes made to preparatory file

Empty data fields were replaced with NULL in file allCountries_postcode_PREP_MR_230913.csv, as specified in the script used for the conversion of files to csv format according to WI-002 (Raw data preparation) of SOP-006 (Dataset preparation) using the script ConvertTsv2Csv.py.

A total of 788 records were duplicates of 320 records. Most duplicated records occurred in Japan (73%), followed by Switzerland (13%), Mexico (7%), India (4%), Lithuania (2%), Peru, Belarus and France (<1%). Duplicates were removed from the removed from the file allCountries_postcode_PREP_MR_230913.csv.

Changes made to data

n/a

Unresolved issues

n/a

References

  1. Anonymous GeoNames. (En) GeoNames. [2025-9-30] www.geonames.org

Annex 1: Table column reports

Table: Worldwide postcodes

Column: countrycode

Table 10. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name countrycode
Description

A two-letter code that represents the country name, recommended by ISO standard 3166-1:2020.

Data type String
Descriptor iso-639:alpha-2LanguageCode [UID:0.0.LPHLN110]
Descriptor description

ISO 639-2 is the alpha-3 code in Codes for the representation of names of languages-- Part 2. There are 21 languages that have alternative codes for bibliographic or terminology purposes. In those cases, each is listed separately and they are designated as "B" (bibliographic) or "T" (terminology). In all other cases there is only one ISO 639-2 code. Multiple codes assigned to the same language are to be considered synonyms. ISO 639-1 is the alpha-2 code.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.LPHLN110
Unit

n/a

Table 11. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
countrycode 2 - 2 n/a AD n/a n/a n/a ZA 1,550,798 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 97 ( 0.0% )
Table 12. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
countrycode
100.00%
0.01%
PT AS

Data Distribution Top 20

Figure 1. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 2. Distribution of 20 least common values, from lowest to highest.

Completeness

Figure 3. Visualization of completeness of the data in the column.

Uniqueness

Figure 4. Visualization of uniqueness of the data in the column.

Column: postalcode

Table 13. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name postalcode
Description

Postal code.

Data type String
Descriptor eurostat:postcode [UID:0.0.PSTCD378]
Descriptor description

A postal code (also known locally in various English-speaking countries throughout the world as a postcode, post code, PIN or ZIP Code) is a series of letters or digits or both, sometimes including spaces or punctuation, included in a postal address for the purpose of sorting mail.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.PSTCD378
Unit

n/a

Table 14. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
postalcode 2 - 15 n/a M9 n/a n/a n/a 78177 CITYSS… 1,550,798 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 721,060 ( 46.5% )
Table 15. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
postalcode
100.00%
46.50%
21825 AD100

Completeness

Figure 5. Visualization of completeness of the data in the column.

Uniqueness

Figure 6. Visualization of uniqueness of the data in the column.

Column: placename

Table 16. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name placename
Description

Name of the location.

Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 17. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
placename 0 - 161 n/a n/a n/a n/a Hamilton (So… 1,550,798 1 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 792,066 ( 51.1% )
Table 18. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
placename
100.00%
51.07%
Lisboa Encamp

Completeness

Figure 7. Visualization of completeness of the data in the column.

Uniqueness

Figure 8. Visualization of uniqueness of the data in the column.

Column: adminname1

Table 19. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name adminname1
Description
  1. order subdivision (state).
Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 20. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
adminname1 0 - 48 n/a n/a n/a n/a Región del L… 1,550,798 131,031 ( 8.4% ) 0 ( 0.0% ) 0 ( 0.0% ) 1,560 ( 0.1% )
Table 21. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
adminname1
91.55%
0.10%
null Canillo

Completeness

Figure 9. Visualization of completeness of the data in the column.

Uniqueness

Figure 10. Visualization of uniqueness of the data in the column.

Column: admincode1

Table 22. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name admincode1
Description
  1. order subdivision (state).
Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 23. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
admincode1 0 - 9 n/a n/a n/a n/a L93000001 1,550,798 136,574 ( 8.8% ) 0 ( 0.0% ) 0 ( 0.0% ) 465 ( 0.0% )
Table 24. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
admincode1
91.19%
0.03%
null L93000001

Data Distribution Top 20

Figure 11. Distribution of 20 most common values, from highest to lowest.

Data Distribution Bottom 20

Figure 12. Distribution of 20 least common values, from lowest to highest.

Completeness

Figure 13. Visualization of completeness of the data in the column.

Uniqueness

Figure 14. Visualization of uniqueness of the data in the column.

Column: adminname2

Table 25. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name adminname2
Description
  1. order subdivision (county/province).
Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 26. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
adminname2 0 - 49 n/a n/a n/a n/a Dolores Hida… 1,550,798 256,764 ( 16.6% ) 0 ( 0.0% ) 0 ( 0.0% ) 15,043 ( 1.0% )
Table 27. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
adminname2
83.44%
0.97%
null Rust Stadt

Completeness

Figure 15. Visualization of completeness of the data in the column.

Uniqueness

Figure 16. Visualization of uniqueness of the data in the column.

Column: admincode2

Table 28. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name admincode2
Description
  1. order subdivision (county/province).
Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 29. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
admincode2 0 - 9 n/a n/a n/a n/a S12000017 1,550,798 331,793 ( 21.4% ) 0 ( 0.0% ) 0 ( 0.0% ) 12,201 ( 0.8% )
Table 30. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
admincode2
78.61%
0.79%
null 941

Completeness

Figure 17. Visualization of completeness of the data in the column.

Uniqueness

Figure 18. Visualization of uniqueness of the data in the column.

Column: adminname3

Table 31. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name adminname3
Description
  1. order subdivision (community).
Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 32. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
adminname3 0 - 51 n/a n/a n/a n/a San Leonardo… 1,550,798 744,704 ( 48.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 40,539 ( 2.6% )
Table 33. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
adminname3
51.98%
2.61%
null Rust

Completeness

Figure 19. Visualization of completeness of the data in the column.

Uniqueness

Figure 20. Visualization of uniqueness of the data in the column.

Column: admincode3

Table 34. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name admincode3
Description
  1. order subdivision (community).
Data type String
Descriptor Text [UID:0.0.TEXTA315]
Descriptor description

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. String may also denote more general arrays or other sequence (or list) data types and structures.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.TEXTA315
Unit

n/a

Table 35. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
admincode3 0 - 9 n/a n/a n/a n/a W06000011 1,550,798 1,121,361 ( 72.3% ) 0 ( 0.0% ) 0 ( 0.0% ) 24,021 ( 1.5% )
Table 36. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
admincode3
27.69%
1.55%
null 10320

Completeness

Figure 21. Visualization of completeness of the data in the column.

Uniqueness

Figure 22. Visualization of uniqueness of the data in the column.

Column: latitude

Table 37. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name latitude
Description

estimated latitude (wgs84).

Data type Decimal number
Descriptor dwc:decimalLatitude [UID:0.0.LTTDE333]
Descriptor description

The geographic latitude (in decimal degrees, using the spatial reference system given in dwc:geodeticDatum) of the geographic center of a dcterms:Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive.

Descriptor target IRI http://rs.tdwg.org/dwc/terms/decimalLatitude
Unit

°

Table 38. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
latitude 8 - 10 30.0224925 -89.997600 19.6338 37.19295 45.0161 90.000000 1,550,798 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 372,372 ( 24.0% )
Table 39. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
latitude
100.00%
24.01%
38.716700 -24.733300

Continuous Data Distribution

Figure 23. Distribution of values in the column.

Outliers

Figure 24. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 25. Visualization of completeness of the data in the column.

Uniqueness

Figure 26. Visualization of uniqueness of the data in the column.

Column: longitude

Table 40. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name longitude
Description

estimated longitude (wgs84).

Data type Decimal number
Descriptor dwc:decimalLongitude [UID:0.0.LNGTD332]
Descriptor description

The geographic longitude (in decimal degrees, using the spatial reference system given in dwc:geodeticDatum) of the geographic center of a dcterms:Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive.

Descriptor target IRI http://rs.tdwg.org/dwc/terms/decimalLongitude
Unit

°

Table 41. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
longitude 8 - 11 21.6569758 -179.260000 -8.8368 16.6563 81.315625 179.310000 1,550,798 0 ( 0.0% ) 0 ( 0.0% ) 0 ( 0.0% ) 509,297 ( 32.8% )
Table 42. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
longitude
100.00%
32.84%
-9.133300 -63.772200

Continuous Data Distribution

Figure 27. Distribution of values in the column.

Outliers

Figure 28. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 29. Visualization of completeness of the data in the column.

Uniqueness

Figure 30. Visualization of uniqueness of the data in the column.

Column: accuracy

Table 43. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Parameter Content
Column name accuracy
Description

accuracy of lat/lng from 1=estimated, 4=geonameid, 6=centroid of addresses or shape.

Data type Integer number
Descriptor Integer [UID:0.0.NTGER313]
Descriptor description

A number with no fractional part, including the negative and positive numbers as well as zero.

Descriptor target IRI https://app.pollinatorhub.eu/vocabulary/descriptors/0.0.NTGER313
Unit

n/a

Table 44. Structural analysis of the table. Column name (Name); concise description of the column (Description); data type in which values are stored (Data type); EUPH-Descriptor (Descriptor); unit in which the values are provided (Unit).
Column Name Range Mean Minimum Q1 Median Q3 Maximum Total Missing Zero Blank Distinct
accuracy 1 - 1 3.7 1 3 4 4 6 1,550,798 274,588 ( 17.7% ) 0 ( 0.0% ) 0 ( 0.0% ) 7 ( 0.0% )
Table 45. Quality analysis of the table. Column name (Name); completeness of the column (Completeness); uniqueness of the column (Uniqueness); most common value in the column (Most Common Value); least common value in the column (Least Common Value).
Column Name Completeness Uniqueness Most Common Value Least Common Value
accuracy
82.29%
0.00%
4 2

Data Distribution Top 20

Figure 31. Distribution of 20 most common values, from highest to lowest.

Continuous Data Distribution

Figure 32. Distribution of values in the column.

Outliers

Figure 33. Visualization of median, min, max, and outliers in the column.

Completeness

Figure 34. Visualization of completeness of the data in the column.

Uniqueness

Figure 35. Visualization of uniqueness of the data in the column.