Title: | Tools for Accessing Various Datasets Developed by the Foundation SmarterPoland.pl |
---|---|
Description: | Tools for accessing and processing datasets prepared by the Foundation SmarterPoland.pl. Among all: access to API of Google Maps, Central Statistical Office of Poland, MojePanstwo, Eurostat, WHO and other sources. |
Authors: | Przemyslaw Biecek |
Maintainer: | Przemyslaw Biecek <[email protected]> |
License: | GPL-3 |
Version: | 1.8.1 |
Built: | 2024-11-12 02:54:41 UTC |
Source: | https://github.com/pbiecek/smarterpoland |
Tools for accessing and processing datasets prepared by the Foundation SmarterPoland.pl. Among all: access to API of Google Maps, Central Statistical Office of Poland, Eurostat, WHO and other sources.
Author: Przemyslaw Biecek Maintainer: Przemyslaw Biecek <[email protected]>
getMillwardBrown
,
getEurostatRCV
,
getBDLseries
,
getWeatherForecast
## Not run: # download the dataset 'Pupil/Student - teacher ratio and average class' from eurostat # for more developed API see https://github.com/rOpenGov/eurostat tmp <- getEurostatRCV(kod = "educ_iste") head(tmp) # download the dataset 'People killed in road accidents' from eurostat # and plot a maptable for selected countries # for more developed API see https://github.com/rOpenGov/eurostat library(ggplot2) t1 <- getEurostatRCV("tsdtr420") t1 <- t1[t1$geo ggplot(t1, aes(time, value, color=sex, group=sex)) + geom_line() + facet_wrap(~geo) ## End(Not run)
## Not run: # download the dataset 'Pupil/Student - teacher ratio and average class' from eurostat # for more developed API see https://github.com/rOpenGov/eurostat tmp <- getEurostatRCV(kod = "educ_iste") head(tmp) # download the dataset 'People killed in road accidents' from eurostat # and plot a maptable for selected countries # for more developed API see https://github.com/rOpenGov/eurostat library(ggplot2) t1 <- getEurostatRCV("tsdtr420") t1 <- t1[t1$geo ggplot(t1, aes(time, value, color=sex, group=sex)) + geom_line() + facet_wrap(~geo) ## End(Not run)
Access to the GUS Bank Danych Lokalnych with the use of API developed by MojePanstwo.
Download and parse data from Bank Danych Lokalnych with the use of API developed by MojePanstwo.
getBDLtree(raw = FALSE, debug = 0) getBDLsearch(query = "", debug = 0, raw = FALSE) getBDLseries(metric_id = "", slice = NULL, time_range = NULL, wojewodztwo_id = NULL, powiat_id = NULL, gmina_id = NULL, meta = NULL, debug = 0, raw = FALSE) getMPgminy(debug = 0) getMPpowiaty(debug = 0) getMPwojewodztwa(debug = 0)
getBDLtree(raw = FALSE, debug = 0) getBDLsearch(query = "", debug = 0, raw = FALSE) getBDLseries(metric_id = "", slice = NULL, time_range = NULL, wojewodztwo_id = NULL, powiat_id = NULL, gmina_id = NULL, meta = NULL, debug = 0, raw = FALSE) getMPgminy(debug = 0) getMPpowiaty(debug = 0) getMPwojewodztwa(debug = 0)
debug |
Level of debug info. 0 for no debug, 1 or 2 for info about processed groups. |
raw |
If raw = TRUE the resulting JSON is returned without any transformation. For raw = FALSE results are transformed into a data.frame. |
query |
A query for DBL search. |
metric_id |
Metric id, if unknown then look for it in DBL tree or DBL search. |
slice |
A table with id dimensions, with format [1,34,*]. Use '*' to choose all dimensions (or use an empty string). |
time_range |
Year or range (like 2000:2010), empty means - full range. |
wojewodztwo_id |
Voievodship id or '*' for all. |
powiat_id |
County id of '*' for all. It's internal ID. Use |
gmina_id |
Subcounty id or '*' for all. It's internal ID. Use |
meta |
Should meta data be returned? |
The function getMPgminy()
returns a data frame with identifiers id/TERYT for each subcounty.
The function getMPpowiaty()
returns a data frame with identifiers id for each county.
The function getBDLtree()
returns a data frame with identifiers of resources in Bank Danych Lokalnych.
Przemyslaw Biecek
The API of Bank Danych Lokalnych developed by MojePanstwo is described as https://mojepanstwo.pl/api/dane/get_dane_dataset
## Not run: # the data is downloaded and parsed from Internet # not that this dataset is pre-calculated in the package BDLtree <- getBDLtree(2) head(BDLtree) DBLtransport <- getBDLsearch("transport") head(DBLtransport) BDLseries <- getBDLseries(metric_id = 1) head(BDLseries) gminy <- getMPgminy() head(gminy) powiaty <- getMPpowiaty() head(powiaty) ## End(Not run)
## Not run: # the data is downloaded and parsed from Internet # not that this dataset is pre-calculated in the package BDLtree <- getBDLtree(2) head(BDLtree) DBLtransport <- getBDLsearch("transport") head(DBLtransport) BDLseries <- getBDLseries(metric_id = 1) head(BDLseries) gminy <- getMPgminy() head(gminy) powiaty <- getMPpowiaty() head(powiaty) ## End(Not run)
A subset of world.citiesmaps. Extracted in order to shink number of dependencies. Only cities with pop > 50k are keept.
Przemyslaw Biecek [based on world.cities]
## Not run: library(maps) data(world.cities) cities_lon_lat <- world.cities[!duplicated(world.cities$name),] rownames(cities_lon_lat) = cities_lon_lat[,1] cities_lon_lat <- cities_lon_lat[cities_lon_lat$pop > 50000,] cities_lon_lat <- cities_lon_lat[,4:5] ## End(Not run)
## Not run: library(maps) data(world.cities) cities_lon_lat <- world.cities[!duplicated(world.cities$name),] rownames(cities_lon_lat) = cities_lon_lat[,1] cities_lon_lat <- cities_lon_lat[cities_lon_lat$pop > 50000,] cities_lon_lat <- cities_lon_lat[,4:5] ## End(Not run)
Data from World Health Organization database http://apps.who.int/gho/data/view.main.CBDR2040. Based on the example from Grammar of Graphics by Leland Wilkinson.
Przemyslaw Biecek [based on WHO data]
## Not run: library(maps) data(countries) head(countries) ## End(Not run)
## Not run: library(maps) data(countries) head(countries) ## End(Not run)
Access to hourly and daily weather forecasts with the use of Dark Sky API.
getWeatherForecast(apiKey, lat = NA, lon = NA, city = NA, raw=FALSE)
getWeatherForecast(apiKey, lat = NA, lon = NA, city = NA, raw=FALSE)
apiKey |
You need to have Dark Sky apiKey in order to access weather forecasts. See here: https://developer.forecast.io/ hor more details. |
lat |
The latitude coordinate for which prediction has to be made. |
lon |
The longitude coordinate for which prediction has to be made. |
city |
Instead of lat and lon you may specify name of the city for which prediction has to be made. |
raw |
If TRUE then no parsing is done. The function getWeatherForecast() just download an forecast and returns it as a list. |
The function getWeatherForecast()
returns list of three datasets.
now and by.hour datasets contains predictions. For each timepoint following information are collected:
time, summary, icon, precipIntensity, precipProbability, temperature, apparentTemperature, dewPoint, humidity, windSpeed, windBearing, visibility, cloudCover, pressure, ozone, temperatureCelsius, apparentTemperatureCelsius
Daily predictions (by.day component) contain following information:
time, summary, icon, sunriseTime, sunsetTime, moonPhase, precipIntensity, precipIntensityMax, precipProbability, temperatureMin, temperatureMinTime, temperatureMax, temperatureMaxTime, apparentTemperatureMin, apparentTemperatureMinTime, apparentTemperatureMax, apparentTemperatureMaxTime, dewPoint, humidity, windSpeed, windBearing, visibility, cloudCover, pressure, ozone, precipIntensityMaxTime, precipType, temperatureMaxCelsius, temperatureMinCelsius, apparentTemperatureMaxCelsius, apparentTemperatureMinCelsius
Przemyslaw Biecek
The Dark Sky API for weather forecasts is described as https://developer.forecast.io/
## Not run: # you have to have apiKey to execute these examples library(scales) library(ggplot2) prognoza <- getWeatherForecast(apiKey, city='Warsaw') ggplot(prognoza$by.hour, aes(y=temperatureCelsius, x=time)) + geom_line() + geom_point() + geom_point(data=prognoza$now, size=10, color='red') + theme(title=element_text(size=20), axis.text=element_text(size=20)) + scale_x_datetime(breaks = date_breaks("3 hour"), minor_breaks = date_breaks("1 hour"), labels = date_format(" ylab("") + xlab("") + ggtitle("Prognoza temperatury dla Warszawy") ## End(Not run)
## Not run: # you have to have apiKey to execute these examples library(scales) library(ggplot2) prognoza <- getWeatherForecast(apiKey, city='Warsaw') ggplot(prognoza$by.hour, aes(y=temperatureCelsius, x=time)) + geom_line() + geom_point() + geom_point(data=prognoza$now, size=10, color='red') + theme(title=element_text(size=20), axis.text=element_text(size=20)) + scale_x_datetime(breaks = date_breaks("3 hour"), minor_breaks = date_breaks("1 hour"), labels = date_format(" ylab("") + xlab("") + ggtitle("Prognoza temperatury dla Warszawy") ## End(Not run)
Download a dictionary for given coded variable from Eurostat (ec.europa.eu/eurostat).
getEurostatDictionary(dictname)
getEurostatDictionary(dictname)
dictname |
Character, dictionary for given variable name will be downloaded. |
A data.frame with two columns, first with code names and second with full names.
Przemyslaw Biecek
The TOC is downloaded from the http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=dic....
See Also as getEurostatRCV
, getEurostatRaw
, grepEurostatTOC
.
## Not run: tmp <- getEurostatDictionary("crop_pro") head(tmp) ## End(Not run)
## Not run: tmp <- getEurostatDictionary("crop_pro") head(tmp) ## End(Not run)
Download a dataset from the eurostat database. The dataset is transformed into the tabular format.
getEurostatRaw(kod = "educ_iste", rowRegExp=NULL, colRegExp=NULL, strip.white = TRUE)
getEurostatRaw(kod = "educ_iste", rowRegExp=NULL, colRegExp=NULL, strip.white = TRUE)
kod |
A code name for the data set of interested. See the table of contents of eurostat datasets for more details. |
rowRegExp |
If not NULL this regular expression will be used to filter rows out of downloaded file. |
colRegExp |
If not NULL this regular expression will be used to filter collumns out of downloaded file. |
strip.white |
Passed to the internal |
A dataset in data.frame format. First column contains names of cases. Column names usually corresponds to years.
Przemyslaw Biecek
Data is downloaded from http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing
website.
See Also as getEurostatTOC
, getEurostatRaw
, grepEurostatTOC
.
## Not run: tmp <- getEurostatRaw(kod = "educ_iste") head(tmp) ## End(Not run)
## Not run: tmp <- getEurostatRaw(kod = "educ_iste") head(tmp) ## End(Not run)
Download a dataset from the eurostat database. The dataset is transformed into the molten / row-column-value format (RCV).
getEurostatRCV(kod = "educ_iste", ...)
getEurostatRCV(kod = "educ_iste", ...)
kod |
A code name for the data set of interested. See the table of contents of eurostat datasets for more details. |
... |
Other parameters that are passed to getEurostatRaw(). |
A dataset in the molten format with the last column 'value'.
Przemyslaw Biecek
Data is downloaded from http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing
website.
See Also as getEurostatTOC
, getEurostatRaw
, grepEurostatTOC
.
## Not run: tmp <- getEurostatRCV(kod = "educ_iste") head(tmp) t1 <- getEurostatRCV("rail_ac_catvict") tmp <- cast(t1, geo ~ time , mean, subset=victim=="KIL" & pers_inv=="TOTAL" & accident=="TOTAL") tmp3 <- tmp[,1:9] rownames(tmp3) <- tmp3[,1] tmp3 <- tmp3[c("UK", "SK", "FR", "PL", "ES", "PT", "LV"),] matplot(2005:2012,t(tmp3[,-1]), type="o", pch=19, lty=1, las=1, xlab="", ylab="", yaxt="n") axis(2,tmp3[,9], rownames(tmp3), las=1) ## End(Not run)
## Not run: tmp <- getEurostatRCV(kod = "educ_iste") head(tmp) t1 <- getEurostatRCV("rail_ac_catvict") tmp <- cast(t1, geo ~ time , mean, subset=victim=="KIL" & pers_inv=="TOTAL" & accident=="TOTAL") tmp3 <- tmp[,1:9] rownames(tmp3) <- tmp3[,1] tmp3 <- tmp3[c("UK", "SK", "FR", "PL", "ES", "PT", "LV"),] matplot(2005:2012,t(tmp3[,-1]), type="o", pch=19, lty=1, las=1, xlab="", ylab="", yaxt="n") axis(2,tmp3[,9], rownames(tmp3), las=1) ## End(Not run)
Download a table of contents of eurostat datasets. Note that the values in column 'code' should be used to download a selected dataset.
getEurostatTOC()
getEurostatTOC()
A data.frame with eight columns
title |
The name of dataset of theme |
code |
The codename of dataset of theme, will be used by the getEurostatRCV and getEurostatRaw functions. |
type |
Is it a dataset, folder or table. |
last.update.of.data , last.table.structure.change , data.start , data.end
|
Dates. |
Przemyslaw Biecek
The TOC is downloaded from the http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents_en.txt
See Also as getEurostatRCV
, getEurostatRaw
, grepEurostatTOC
.
## Not run: tmp <- getEurostatTOC() head(tmp) ## End(Not run)
## Not run: tmp <- getEurostatTOC() head(tmp) ## End(Not run)
Get geolocalisation (longitude, latitude) of a given adress with the use of Google Maps API.
The Google Maps API is used to determine the geolocalisation (longitude, latitude) of a given adress.
getGoogleMapsAddress(street = "Banacha 2", city = "Warszawa", country="Poland", positionOnly = TRUE, delay=1)
getGoogleMapsAddress(street = "Banacha 2", city = "Warszawa", country="Poland", positionOnly = TRUE, delay=1)
street |
An address (street and building number) |
city |
City |
country |
Country |
positionOnly |
What should be returned, vector with longitude, latitude coordinates or the raw result from Google Maps API |
delay |
Number of seconds to wait between api calls |
If positionOnly=TRUE then a vector with two values or a raw list from Google Maps otherwise.
Przemyslaw Biecek
The Google Maps API https://developers.google.com/maps/
## Not run: getGoogleMapsAddress() ## End(Not run)
## Not run: getGoogleMapsAddress() ## End(Not run)
Download pool results from MillwardBrown website.
getMillwardBrown()
getMillwardBrown()
A dataset in the molten format with pool date, party and percent of votes.
Maciej Beresewicz [data extraction] Przemyslaw Biecek [data melting]
## Not run: getMillwardBrown() ## End(Not run)
## Not run: getMillwardBrown() ## End(Not run)
Lists names of dataset from eurostat with the particular pattern in the description.
This function downloads list of all datasets available on eurostat and return list of names of datasets that contains particular pattern in the dataset description.
E.g. all datasets related to education of teaching.
grepEurostatTOC(pattern)
grepEurostatTOC(pattern)
pattern |
Character, only datasets that contains this pattern in the description will be returned. |
A data.frame with eight columns
title |
The name of dataset of theme |
code |
The codename of dataset of theme, will be used by the getEurostatRCV and getEurostatRaw functions. |
type |
Is it a dataset, folder or table. |
last.update.of.data , last.table.structure.change , data.start , data.end
|
Dates. |
Przemyslaw Biecek
See Also as getEurostatRCV
, getEurostatRaw
, getEurostatTOC
.
## Not run: tmp <- grepEurostatTOC("education") head(tmp) ## End(Not run)
## Not run: tmp <- grepEurostatTOC("education") head(tmp) ## End(Not run)
This dataset is created based on data from ZPD package, see https://github.com/zozlak/ZPD
and http://zpd.ibe.edu.pl/doku.php?id=obazie
.
Each row shows results for one person that takes matura exams in a given year.
Przemyslaw Biecek [based on IBE / ZPD data]
## Not run: data(maturaExam) head(maturaExam) ## End(Not run)
## Not run: data(maturaExam) head(maturaExam) ## End(Not run)