Downloading and using data from bdl
bdl package is an interface to Local Data Bank(Bank Danych Lokalnych - bdl) API with a set of useful tools like quick plotting using data from the data bank.
bdl is based on id codes. Most of the data downloading functions require specifying one or vector of multiple unit or variable ids as a string.
It is recommended to use a private API key which u can get here. To apply it use:
Also, every function returns data in Polish by default. If you would like to get data in English, just add
lang = "en" to any function.
Any metadata information (unit levels, aggregates, NUTS code explanation, etc.) can be found here.
When searching for unit id, we can use two methods:
Units consist of 6 levels:
The lowest - seventh level has its own separate functions with suffix
localities. Warning - the
localities functions have a different set of arguments. Check package or API documentation for more info.
search_units() takes couple different arguments like:
name- required search phrase (can be empty string)
level- narrows returned units to given level
and more. To look for more arguments on any given function check package or API documentation.
To get all units available in local data bank run
get_units() without any argument(warning - it can eat data limit very fast around 4.5k rows):
To narrow the list add
unitParentId. The function will return all children units for a given parent at all levels. Add
level argument to filter units even further.
get_units(parentId = "000000000000", level = 5)
Subjects are themed directories of variables.
We have two searching methods for both subjects and variables:
- Direct search
- Subject tree listing by
To directly search for subject we just provide search phrase:
Subjects consist of 3 levels (categories, groups, subgroups) -
P respectively. The fourth level of the subject (child of a subgroup) would be variables.
To list all top level subjects use
To list sub-subjects to given category or group use
Firstly you can list variables for given subject (subgroup):
Secondly, you can direct search variables with
search_variables(). You can use an empty string as
name to list all variables but I strongly advise against as it has around 40 000 rows and you will probably hit data limit.
You can narrow the search to the given subject - subgroup:
search_variables("lud", subjectId = "P2425")
If you picked unit and variable codes, you are ready to download data. You can do this two ways:
- Download data of multiple variables for a single unit
- Download data of single variable for multiple units
We will use
get_data_by_unit(). We specify our single unit as
unitId string argument and variables by a vector of strings. Optionally we can specify years of data. If not all available years are used.
get_data_by_unit(unitId = "023200000000", varId = "3643") get_data_by_unit(unitId = "023200000000", varId = c("3643", "2137", "148190"))
To get more information about data we can add
type argument and set it to
"label" to add an additional column with the variable info.
get_data_by_unit(unitId = "023200000000", varId = "3643", type = "label")
We will use
get_data_by_variable(). We specify our single variable as
varId string argument. If no
unitParentId is provided, the function will return all available units for a given variable. Setting
unitParentId will return all available children units (on all levels). To narrow unit level set
unitLevel. Optionally we can specify years of data. If not all available years are used.
bdl package provides a couple of additional functions for summarizing and visualizing data.
Plotting functions in this package are interfaces to the data downloading functions. Some of them require specifying
data_type - a method for downloading data, and the rest of the arguments will be relevant to specify
data_type function. Check documentation for more details.
pie_plot(data_type ="variable" ,"1", "2018",unitParentId="042214300000", unitLevel = "6")
Scatter plot is unique - requires vector of only 2 variables.
bdl package comes with the
bdl.maps dataset containing spatial maps for each Poland’s level.
generate_map() use them to generate maps filled with the bdl data. Use
unitLevel to change the type of map. When the lower level is chosen, the map generation can be more time consuming as it has more spatial data to process. This function will download and load maps automatically. In case of any errors you can download them manually here.
Download data file and double-click to load it to environment.
generate_map(varId = "60559", year = "2017", unitLevel = 3)
get_data_by_variable() have alternative “multi” downloading mode. Function that would work for example single unit, if provided a vector will make additional column with values for each unit provided:
Or multiple variables for
This mode works for the locality version as well.
More consistent method of downloading multiple variables for multiple units is provided by
get_panel_data(unitId = c("030210101000", "030210105000", "030210106000"), varId = c("60270", "461668"), year = c(2015:2016))
It offers also parameter
ggplot = TRUE which produces output in the long form suitable for plotting with ggplot package:
library(ggplot2) df <- get_panel_data(unitId = c("030210101000", "030210105000", "030210106000"), varId = c("60270", "461668"), year = c(2015:2018), ggplot = TRUE) ggplot(df,aes(x=year, y= values, color = unit)) + geom_line(aes(linetype = variables)) + scale_color_discrete(labels = c("A", "B", "C")) + scale_linetype_discrete(labels = c("X", "Y"))