The bdl
package is an interface to Local Data Bank(Bank
Danych Lokalnych - bdl) API with a set of
useful tools like quick plotting using data from the data bank.
Intro
Working with bdl
is based on id codes. Most of the data
downloading functions require specifying one or vector of multiple unit
or variable ids as a string.
It is recommended to use a private API key which u can get here. To apply it
use: options(bdl.api_private_key ="your_key")
Also, every function returns data in Polish by default. If you would
like to get data in English, just add lang = "en"
to any
function.
Any metadata information (unit levels, aggregates, NUTS code explanation, etc.) can be found here.
Searching unit id
When searching for unit id, we can use two methods:
- Direct search
search_units()
- Tree listing by
get_units()
Units consist of 6 levels:
The lowest - seventh level has its own separate functions with suffix
localities
. Warning - the localities
functions
have a different set of arguments. Check package or API documentation
for more info.
Direct search
Direct searching search_units()
takes couple different
arguments like:
-
name
- required search phrase (can be empty string) -
level
- narrows returned units to given level
and more. To look for more arguments on any given function check package or API documentation.
search_units(name = "wro")
search_units(name = "", level = 3)
Tree listing
To get all units available in local data bank run
get_units()
without any argument(warning - it can eat data
limit very fast around 4.5k rows):
To narrow the list add unitParentId
. The function will
return all children units for a given parent at all levels. Add
level
argument to filter units even further.
get_units(parentId = "000000000000", level = 5)
Searching subject and variable id
Subjects are themed directories of variables.
We have two searching methods for both subjects and variables:
- Direct search
search_variables()
andsearch_subjects()
- Subject tree listing by
get_subjects()
andget_variables()
Subjects
To directly search for subject we just provide search phrase:
search_subjects("lud")
Subjects consist of 3 levels (categories, groups, subgroups) -
K
, G
and P
respectively. The
fourth level of the subject (child of a subgroup) would be
variables.
To list all top level subjects use get_subjects()
:
To list sub-subjects to given category or group use
get_subjects()
with parentId
argument:
get_subjects(parentId = "K3")
get_subjects(parentId = "G7")
Variables
Firstly you can list variables for given subject (subgroup):
get_variables("P2425")
Secondly, you can direct search variables with
search_variables()
. You can use an empty string as
name
to list all variables but I strongly advise against as
it has around 40 000 rows and you will probably hit data limit.
search_variables("samochod")
You can narrow the search to the given subject - subgroup:
search_variables("lud", subjectId = "P2425")
Downloading data
If you picked unit and variable codes, you are ready to download data. You can do this two ways:
- Download data of multiple variables for a single unit
get_data_by_unit()
- Download data of single variable for multiple units
get_data_by_variable()
Single unit, multiple variables
We will use get_data_by_unit()
. We specify our single
unit as unitId
string argument and variables by a vector of
strings. Optionally we can specify years of data. If not all available
years are used.
get_data_by_unit(unitId = "023200000000", varId = "3643")
get_data_by_unit(unitId = "023200000000", varId = c("3643", "2137", "148190"))
To get more information about data we can add type
argument and set it to "label"
to add an additional column
with the variable info.
get_data_by_unit(unitId = "023200000000", varId = "3643", type = "label")
Multiple units, single variable
We will use get_data_by_variable()
. We specify our
single variable as varId
string argument. If no
unitParentId
is provided, the function will return all
available units for a given variable. Setting unitParentId
will return all available children units (on all levels). To narrow unit
level set unitLevel
. Optionally we can specify years of
data. If not all available years are used.
get_data_by_variable("420", unitParentId = "011210000000", year = 2013:2016)
get_data_by_variable("420", unitLevel = "2", year = 2013:2016)
Useful tools
The bdl
package provides a couple of additional
functions for summarizing and visualizing data.
Summary
Data downloaded via get_data_by_unit()
or
get_data_by_variable()
and their locality versions can be
easily summarized by summary()
:
df <- get_data_by_variable(varId = "3643", unitParentId = "010000000000")
summary(df)
Plotting
Plotting functions in this package are interfaces to the data
downloading functions. Some of them require specifying
data_type
- a method for downloading data, and the rest of
the arguments will be relevant to specify data_type
function. Check documentation for more details.
pie_plot(data_type ="variable" ,"1", "2018",unitParentId="042214300000", unitLevel = "6")
Scatter plot is unique - requires vector of only 2 variables.
scatter_2var_plot(data_type ="variable" ,c("60559","415"), unitLevel = "2")
Map generation
The bdl
package comes with the bdl.maps
dataset containing spatial maps for each Poland’s level.
generate_map()
use them to generate maps filled with the
bdl data. Use unitLevel
to change the type of map. When the
lower level is chosen, the map generation can be more time consuming as
it has more spatial data to process. This function will download and
load maps automatically. In case of any errors you can download them
manually here.
Download data file and double-click to load it to environment.
generate_map(varId = "60559", year = "2017", unitLevel = 3)
Multi download
Downloading functions get_data_by_unit()
and
get_data_by_variable()
have alternative “multi” downloading
mode. Function that would work for example single unit, if provided a
vector will make additional column with values for each unit
provided:
get_data_by_unit(unitId = c("023200000000", "020800000000"), varId = c("3643", "2137", "148190"))
Or multiple variables for get_data_by_variable()
:
get_data_by_variable(varId =c("3643","420"), unitParentId = "010000000000")
This mode works for the locality version as well.
More consistent method of downloading multiple variables for multiple
units is provided by get_panel_data()
function:
get_panel_data(unitId = c("030210101000", "030210105000", "030210106000"), varId = c("60270", "461668"), year = c(2015:2016))
It offers also parameter ggplot = TRUE
which produces
output in the long form suitable for plotting with ggplot package:
library(ggplot2)
df <- get_panel_data(unitId = c("030210101000", "030210105000", "030210106000"), varId = c("60270", "461668"), year = c(2015:2018), ggplot = TRUE)
ggplot(df,aes(x=year, y= values, color = unit)) + geom_line(aes(linetype = variables)) + scale_color_discrete(labels = c("A", "B", "C")) + scale_linetype_discrete(labels = c("X", "Y"))