• About
  • Documentation

  • More Universes
  • Recent Updates
  • Leader board

  • All repositories
  • All packages
  • All articles
  • All datasets
  • All system Libraries
pbiecek
  • Builds
  • Packages
  • Articles
  • Datasets
  • Contribution
  • Badges
  • API
  • Feed

Links topbiecek

DALEX - moDel Agnostic Language for Exploration and eXplanation

Any unverified black box model is the path to failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection. DALEX package xrays any model and helps to explore and explain its behaviour. Machine Learning (ML) models are widely used and have various applications in classification or regression. Models created with boosting, bagging, stacking or similar techniques are often used due to their high performance. But such black-box models usually lack direct interpretability. DALEX package contains various methods that help to understand the link between input variables and model output. Implemented methods help to explore the model on the level of a single instance as well as a level of the whole dataset. All model explainers are model agnostic and can be compared across different models. DALEX package is the cornerstone for 'DrWhy.AI' universe of packages for visual model exploration. Find more details in (Biecek 2018) <https://jmlr.org/papers/v19/18-416.html>.

Last updated

black-boxdalexdata-scienceexplainable-aiexplainable-artificial-intelligenceexplainable-mlexplanationsexplanatory-model-analysisfairnessimlinterpretabilityinterpretable-machine-learningmachine-learningmodel-visualizationpredictive-modelingresponsible-airesponsible-mlxai

13.53 score 1.5k stars 19 dependents 1.5k scripts 9.4k downloads

ingredients - Effects and Importances of Model Ingredients

Collection of tools for assessment of feature importance and feature effects. Key functions are: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the what-if plots, partial_dependence() for partial dependence plots, conditional_dependence() for conditional dependence plots, accumulated_dependence() for accumulated local effects plots, aggregate_profiles() and cluster_profiles() for aggregation of ceteris paribus profiles, generic print() and plot() for better usability of selected explainers, generic plotD3() for interactive, D3 based explanations, and generic describe() for explanations in natural language. The package 'ingredients' is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.

Last updated

10.67 score 38 stars 20 dependents 96 scripts 9.0k downloads

iBreakDown - Model Agnostic Instance Level Variable Attributions

Model agnostic tool for decomposition of predictions from black boxes. Supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models. It is an extension of the 'breakDown' package (Staniak and Biecek 2018) <doi:10.32614/RJ-2018-072>, with new and faster strategies for orderings. It supports interactions in explanations and has interactive visuals (implemented with 'D3.js' library). The methodology behind is described in the 'iBreakDown' article (Gosiewska and Biecek 2019) <arXiv:1903.11420> This package is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.

Last updated

breakdownimlinterpretabilityshapleyxai

10.30 score 84 stars 19 dependents 68 scripts 7.6k downloads

breakDown - Model Agnostic Explainers for Individual Predictions

Model agnostic tool for decomposition of predictions from black boxes. Break Down Table shows contributions of every variable to a final prediction. Break Down Plot presents variable contributions in a concise graphical way. This package work for binary classifiers and general regression models.

Last updated

data-scienceimlinterpretabilitymachine-learningvisual-explanationsxai

8.94 score 104 stars 2 dependents 100 scripts 563 downloads

archivist - Tools for Storing, Restoring and Searching for R Objects

Data exploration and modelling is a process in which a lot of data artifacts are produced. Artifacts like: subsets, data aggregates, plots, statistical models, different versions of data sets and different versions of results. The more projects we work with the more artifacts are produced and the harder it is to manage these artifacts. Archivist helps to store and manage artifacts created in R. Archivist allows you to store selected artifacts as a binary files together with their metadata and relations. Archivist allows to share artifacts with others, either through shared folder or github. Archivist allows to look for already created artifacts by using it's class, name, date of the creation or other properties. Makes it easy to restore such artifacts. Archivist allows to check if new artifact is the exact copy that was produced some time ago. That might be useful either for testing or caching.

Last updated

8.62 score 73 stars 1 dependents 136 scripts 873 downloads

localModel - LIME-Based Explanations with Interpretable Inputs Based on Ceteris Paribus Profiles

Local explanations of machine learning models describe, how features contributed to a single prediction. This package implements an explanation method based on LIME (Local Interpretable Model-agnostic Explanations, see Tulio Ribeiro, Singh, Guestrin (2016) <doi:10.1145/2939672.2939778>) in which interpretable inputs are created based on local rather than global behaviour of each original feature.

Last updated

6.80 score 14 stars 28 scripts 3.6k downloads

ddst - Data Driven Smooth Tests

Smooth tests are data driven (alternative hypothesis is dynamically selected based on data). In this package you will find two groups of smooth of test: goodness-of-fit tests and nonparametric tests for comparing distributions. Among goodness-of-fit tests there are tests for exponent, Gaussian, Gumbel and uniform distribution. Among nonparametric tests there are tests for stochastic dominance, k-sample test, test with umbrella alternatives and test for change-point problems.

Last updated

data-drivensmooth-teststatisticstest

5.84 score 7 stars 2 dependents 33 scripts 224 downloads

SmarterPoland - Tools for Accessing Various Datasets Developed by the Foundation SmarterPoland.pl

Tools for accessing and processing datasets prepared by the Foundation SmarterPoland.pl. Among all: access to API of Google Maps, Central Statistical Office of Poland, MojePanstwo, Eurostat, WHO and other sources.

Last updated

5.71 score 8 stars 2 dependents 213 scripts 412 downloads

ceterisParibus - Ceteris Paribus Profiles

Ceteris Paribus Profiles (What-If Plots) are designed to present model responses around selected points in a feature space. For example around a single prediction for an interesting observation. Plots are designed to work in a model-agnostic fashion, they are working for any predictive Machine Learning model and allow for model comparisons. Ceteris Paribus Plots supplement the Break Down Plots from 'breakDown' package.

Last updated

5.48 score 43 stars 35 scripts 727 downloads

PogromcyDanych - DataCrunchers (PogromcyDanych) is the Massive Online Open Course that Brings R and Statistics to the People

The data sets used in the online course ,,PogromcyDanych''. You can process data in many ways. The course Data Crunchers will introduce you to this variety. For this reason we will work on datasets of different size (from several to several hundred thousand rows), with various level of complexity (from two to two thousand columns) and prepared in different formats (text data, quantitative data and qualitative data). All of these data sets were gathered in a single big package called PogromcyDanych to facilitate access to them. It contains all sorts of data sets such as data about offer prices of cars, results of opinion polls, information about changes in stock market indices, data about names given to newborn babies, ski jumping results or information about outcomes of breast cancer patients treatment.

Last updated

5.46 score 8 stars 1 dependents 243 scripts 550 downloads

drifter - Concept Drift and Concept Shift Detection for Predictive Models

Concept drift refers to the change in the data distribution or in the relationships between variables over time. 'drifter' calculates distances between variable distributions or variable relations and identifies both types of drift. Key functions are: calculate_covariate_drift() checks distance between corresponding variables in two datasets, calculate_residuals_drift() checks distance between residual distributions for two models, calculate_model_drift() checks distance between partial dependency profiles for two models, check_drift() executes all checks against drift. 'drifter' is a part of the 'DrWhy.AI' universe (Biecek 2018) <arXiv:1806.08915>.

Last updated

concept-driftconcept-shiftdrwhypredictive-modeling

4.45 score 19 stars 1 dependents 5 scripts 244 downloads

bgmm - Gaussian Mixture Modeling Algorithms and the Belief-Based Mixture Modeling

Two partially supervised mixture modeling methods: soft-label and belief-based modeling are implemented. For completeness, we equipped the package also with the functionality of unsupervised, semi- and fully supervised mixture modeling. The package can be applied also to selection of the best-fitting from a set of models with different component numbers or constraints on their structures. For detailed introduction see: Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software <doi:10.18637/jss.v047.i03>.

Last updated

4.18 score 2 stars 1 dependents 51 scripts 98 downloads

PBImisc - A Set of Datasets Used in My Classes or in the Book 'Modele Liniowe i Mieszane w R, Wraz z Przykladami w Analizie Danych'

A set of datasets and functions used in the book 'Modele liniowe i mieszane w R, wraz z przykladami w analizie danych'. Datasets either come from real studies or are created to be as similar as possible to real studies.

Last updated

4.00 score 1 dependents 67 scripts 217 downloads

proton - The Proton Game

'The Proton Game' is a console-based data-crunching game for younger and older data scientists. Act as a data-hacker and find Slawomir Pietraszko's credentials to the Proton server. You have to solve four data-based puzzles to find the login and password. There are many ways to solve these puzzles. You may use loops, data filtering, ordering, aggregation or other tools. Only basics knowledge of R is required to play the game, yet the more functions you know, the more approaches you can try. The knowledge of dplyr is not required but may be very helpful. This game is linked with the ,,Pietraszko's Cave'' story available at http://biecek.pl/BetaBit/Warsaw. It's a part of Beta and Bit series. You will find more about the Beta and Bit series at http://biecek.pl/BetaBit.

Last updated

2.65 score 448 scripts 169 downloads

BetaBit - Mini Games from Adventures of Beta and Bit

Three games: proton, frequon and regression. Each one is a console-based data-crunching game for younger and older data scientists. Act as a data-hacker and find Slawomir Pietraszko's credentials to the Proton server. In proton you have to solve four data-based puzzles to find the login and password. There are many ways to solve these puzzles. You may use loops, data filtering, ordering, aggregation or other tools. Only basics knowledge of R is required to play the game, yet the more functions you know, the more approaches you can try. In frequon you will help to perform statistical cryptanalytic attack on a corpus of ciphered messages. This time seven sub-tasks are pushing the bar much higher. Do you accept the challenge? In regression you will test your modeling skills in a series of eight sub-tasks. Try only if ANOVA is your close friend. It's a part of Beta and Bit project. You will find more about the Beta and Bit project at <https://github.com/BetaAndBit/Charts>.

Last updated

2.02 score 1 stars 104 scripts 332 downloads

Przewodnik - Datasets and Functions Used in the Book 'Przewodnik po Pakiecie R'

Data sets and functions used in the polish book "Przewodnik po pakiecie R" (The Hitchhiker's Guide to the R). See more at <http://biecek.pl/R>. Among others you will find here data about housing prices, cancer patients, running times and many others.

Last updated

1.90 score 16 scripts 271 downloads