| Title: | A Set of Datasets Used in My Classes or in the Book 'Modele Liniowe i Mieszane w R, Wraz z Przykladami w Analizie Danych' |
|---|---|
| Description: | A set of datasets and functions used in the book 'Modele liniowe i mieszane w R, wraz z przykladami w analizie danych'. Datasets either come from real studies or are created to be as similar as possible to real studies. |
| Authors: | Przemyslaw Biecek <[email protected]> |
| Maintainer: | Przemyslaw Biecek <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.0 |
| Built: | 2026-06-03 10:11:50 UTC |
| Source: | https://github.com/pbiecek/pbimisc |
A set of datasets and functions used in the book ,,Modele liniowe i mieszane w R, wraz z przykladami w analizie danych”
| Package: | PBImisc |
| Type: | Package |
| Version: | 1.0 |
| Date: | 2016-02-15 |
| License: | GPL-2 |
A set of datasets some of them are my original ones, some are taken from other packages of literature.
Przemyslaw Biecek
Maintainer: You should complain to Przemyslaw Biecek <[email protected]>
Przemyslaw Biecek ,,Modele liniowe i mieszane w R, wraz z przykladami w analizie danych” 2013, Wydawnictwo PWN
# here you will find some examples ## here you will find some examples #
This dataset bases on blood samples for patients with Acute myeloid leukemia.
data(AML)data(AML)
data.frame with 66 obs. and 5 variables
MutationFactor w/ 4 levels CBFbeta, FLT3, None, Other
CD14.controlCD14 level in the control group
CD14.D3CD14 level after D3 treatment
CD14.1906CD14 level after D3 homolog 1906 treatment
CD14.2191CD14 level after D3 homolog 2191 treatment
Mutation - mutated gene that causes leucemia, one of following CBFbeta, FLT3, None, Other CD14.control, CD14.D3, CD14.1906, CD14.2191 - effects in vitamin D3 or its homologues
Artificial dataset generated to be consistent with Ewa M. study
library(lattice) data(AML) AML2 = reshape(AML, direction="long", varying=colnames(AML)[2:5]) bwplot(CD14~time|Mutation, AML2) interaction.plot(AML2$time,AML2$Mutation, AML2$CD14)library(lattice) data(AML) AML2 = reshape(AML, direction="long", varying=colnames(AML)[2:5]) bwplot(CD14~time|Mutation, AML2) interaction.plot(AML2$time,AML2$Mutation, AML2$CD14)
Dataset downloaded from website http://www.oferty.net/. Dataset contains offer and transictional prices for apartments sold in in Warsaw in years 2007-2009.
data(apartments)data(apartments)
data.frame with 973 obs. and 16 variables
yeardata year of the transaction
monthdata month of the transaction
surfaceapartment area in m2
citycity (all transactions are from Warsaw)
districtdistrict in which the apartment is located, factor with 28 levels
streetsteet in which the apartment is located
n.roomsnumber of rooms
floorfloor
construction.datethe construction year
typeownership rights
offer.priceprice in the offer
transaction.pricedeclared price in the transaction
m2.priceprice per m2
conditionapartment condition, factor with 5 levels
lat, lon
latitude and longitude coordinates for district center
This and other related dataset you may find here http://www.oferty.net/.
website http://www.oferty.net/
data(apartments) library(lattice) xyplot(m2.price~construction.date|district, apartments, type=c("g","p")) # # apartments2 = na.omit(apartments[,c(13,1,3,5,7,8,9,10,14,15,16)]) # wsp = (bincombinations(10)==1)[-1,] # params = matrix(0, nrow(wsp), 3) # for (i in 1:nrow(wsp)) { # model = lm(m2.price~., data=apartments2[,c(TRUE,wsp[i,])]) # params[i,1] = AIC(model, k=log(nrow(apartments2))) # params[i,2] = model$rank # params[i,3] = summary(model)$adj.r.squared # } # plot(params[,2], params[,3], xlab="no. of regressors", ylab="adj R^2") #data(apartments) library(lattice) xyplot(m2.price~construction.date|district, apartments, type=c("g","p")) # # apartments2 = na.omit(apartments[,c(13,1,3,5,7,8,9,10,14,15,16)]) # wsp = (bincombinations(10)==1)[-1,] # params = matrix(0, nrow(wsp), 3) # for (i in 1:nrow(wsp)) { # model = lm(m2.price~., data=apartments2[,c(TRUE,wsp[i,])]) # params[i,1] = AIC(model, k=log(nrow(apartments2))) # params[i,2] = model$rank # params[i,3] = summary(model)$adj.r.squared # } # plot(params[,2], params[,3], xlab="no. of regressors", ylab="adj R^2") #
boxplotpp
boxplotpp(x, xname=seq(1:ncol(x)), utitle="", addLines=TRUE, color = ifelse(addLines, "white","lightgrey"), ...) boxplotInTime(x, xname, additional=T, color = ifelse(additional, "white","lightgrey"), main="", ylim=range(unlist(x),na.rm=T), ..., points = dim(x)[2], at = 1:points)boxplotpp(x, xname=seq(1:ncol(x)), utitle="", addLines=TRUE, color = ifelse(addLines, "white","lightgrey"), ...) boxplotInTime(x, xname, additional=T, color = ifelse(additional, "white","lightgrey"), main="", ylim=range(unlist(x),na.rm=T), ..., points = dim(x)[2], at = 1:points)
x |
TODO |
xname |
TODO |
utitle |
TODO |
addLines |
TODO |
color |
TODO |
additional |
TODO |
main |
TODO |
points |
TODO |
at |
TODO |
ylim |
TODO |
... |
TODO |
TODO
TODO
Przemyslaw Biecek
#TODO#TODO
Dataset from the book ,,Modele liniowe i mieszane w R, wraz z przykladami w analizie danych”.
data(corn)data(corn)
data.frame with 5339 obs. and 36 variables
A dataset with expression of 5339 genes. Each column corresponds to a single experiment. Column name codes the setup of experiment. For example DH.C.1 is related to line DH in the condition C and it is a first technical replicate of this set of conditions.
Note that a noise injection was added to this data, in order to obtain the original dataset please contact with the package maintainer.
Dataset from the book ,,Modele liniowe i mieszane w R, wraz z przykladami w analizie danych”.
Used as an example of modeling of data from expression microarrays with the use of models with mixed effects.
## Not run: require(lme4) names <- colnames(corn) X <- t(matrix(unlist(strsplit(names, ".", fixed=T)), 3, 36)) X <- data.frame(X) colnames(X) <- c("spec", "temp", "plant") summary(X) y <- corn[4662,] lmer(y~spec*temp + (1|plant:spec:temp), data=X) ## End(Not run)## Not run: require(lme4) names <- colnames(corn) X <- t(matrix(unlist(strsplit(names, ".", fixed=T)), 3, 36)) X <- data.frame(X) colnames(X) <- c("spec", "temp", "plant") summary(X) y <- corn[4662,] lmer(y~spec*temp + (1|plant:spec:temp), data=X) ## End(Not run)
Dataset from the book ,,Modele liniowe i mieszane w R, wraz z przykladami w analizie danych”.
data(dementia)data(dementia)
data.frame with 1000 obs. and 4 variables
demscorescore of dementia
ageage, a factor with two levels
sexsex, a factor with two levels
studya source of data, a factor with 10 levels
Dataset from the book ,,Modele liniowe i mieszane w R, wraz z przykladami w analizie danych”.
Used as an example of mixed modeling in meta analysis.
## Not run: require(lme4) modelFullI <- lmer(demscore~age*sex+(age*sex|study), data=dementia, REML=FALSE) summary(modelFullI) ## End(Not run)## Not run: require(lme4) modelFullI <- lmer(demscore~age*sex+(age*sex|study), data=dementia, REML=FALSE) summary(modelFullI) ## End(Not run)
Two datasets with genotypes and phenotypes for backcrossed Drosophilas.
data(Drosophila)data(Drosophila)
Two datasets with genotypes and phenotypes for backcrossed Drosophilas.
The set of 41 markers describes genotypes while 5 variables describe phenotypes. See references for more details.
bmA data.frame with 370 obs. and 46 variables, first 41 are genotypes of gene markers, last five describes genotypes
bsA data.frame with 402 obs. and 46 variables, first 41 are genotypes of gene markers, last five describes genotypes
chrFactor w/ 4 levels CBFbeta, FLT3, None, Other
posMarkers position on chromosom in centimorgnas
The phonotype pc1 is nicely described by genotype in both backcrossed datasets.
Genetic Architecture of a Morphological Shape Difference Between Two Drosophila Species Zhao-Bang Zenga, Jianjun Liu, Lynn F. Stamb, Chen-Hung Kao, John M. Mercer, Cathy C. Laurie Genetics, Vol. 154, 299-310, January 2000
data(Drosophila) library(lattice) # calculate log likelihoods pval1 = numeric(41) for (i in 1:41) { y = Drosophila$bm$pc1 x = factor(Drosophila$bm[,i]) pval1[i] = logLik(lm(y~x)) } # loglikelihood plot xyplot(pval1~pos|chr, data=Drosophila, type=c("p","l"), pch=19, ylab="log likelihood")data(Drosophila) library(lattice) # calculate log likelihoods pval1 = numeric(41) for (i in 1:41) { y = Drosophila$bm$pc1 x = factor(Drosophila$bm[,i]) pval1[i] = logLik(lm(y~x)) } # loglikelihood plot xyplot(pval1~pos|chr, data=Drosophila, type=c("p","l"), pch=19, ylab="log likelihood")
This dataset touch one particular aspect from ECAP dataset. The original dataset is much more richer.
data(ecap)data(ecap)
data.frame with 2102 obs. and 9 variables
city, district
City and district, city is a factor with nine levels, the district effect is nested in the city effect
sexSex
weight, height
Weight and height
house.surfaceSurface of house in which the pearson live
PNIFPeak Nasal Inspiratory Flow
ageAge of the pearson
allergenesNumber of allergens
PNIF stands for Peak Nasal Inspiratory Flow
Artificial dataset generated to be consistent with ECAP (Epidemiologia Chorob Alergicznych w Polsce) study http://www.ecap.pl/
data(ecap) library(lattice) xyplot(PNIF~age|city, data=ecap, type=c("p","g","smooth"))data(ecap) library(lattice) xyplot(PNIF~age|city, data=ecap, type=c("p","g","smooth"))
This dataset bases on origical study of European day hospital evaluation
Artificial dataset (subset from real dataset with some random modifications). Do not use it for derivation of real conclusions.
data(eden)data(eden)
data.frame with 642 obs. and 12 variables
mdidMedical doctor id, there are 24 different MDs which examine patients
centerCity in which the examination takes place
BPRS.Maniac, BPRS.Negative, BPRS.Positive, BPRS.Depression
BPRS stands for Brief Psychiatric Rating Scale, scores are averaged in four subscales
BPRS.AverageAverage from 24 questions
MANSAScale which measures Quality of Life (Manchester Short Assessment of Quality of Life)
sexSex
childrenNumber of childs
years.of.educationNumber of years of education
dayHospitalization mode, day or stationary
This dataset touch one particular aspect from EDEN dataset. The original dataset is much more richer.
Artificial dataset generated to be consistent with Joanna R. study.
Bases on European day hospital evaluation, http://www.edenstudy.com/
data(eden) library(lattice) xyplot(BPRS.Average~MANSA|center, data=eden, type=c("p","g","smooth"))data(eden) library(lattice) xyplot(BPRS.Average~MANSA|center, data=eden, type=c("p","g","smooth"))
Relation between graft function and elastase from nephrology study.
data(elastase)data(elastase)
data.frame with 54 obs. and 5 variables
sex, age, weight
Patient's sex, age and weight
elastaseElastase concentration
GFRPatient's GFR (glomerular filtration rate)
Artificial dataset (real one with some random modifications). Do not use it for medical reasoning.
Artificial dataset generated to be consistent with Malgorzata L. study
data(elastase) library(lattice) xyplot(GFR~elastase, data=elastase, type=c("p","r","g"))data(elastase) library(lattice) xyplot(GFR~elastase, data=elastase, type=c("p","r","g"))
How the endometriosis affects concetration of alpha and beta factors in the blood.
data(endometriosis)data(endometriosis)
data.frame with 165 obs. and 4 variables
diseasedisease, blood samples were taken from women with endometriosis of from healthy ones
phasephase in the menstrual cycle as the examination day (proliferative or secretory)
alpha.factor, beta.factor
concentration of alpha and beta factors in blood
Dataset used as example of ANCOVA
Artificial dataset generated to be consistent with Ula S. study
data(endometriosis) library(lattice) xyplot(log(alpha.factor)~log(beta.factor)|disease*phase, data=endometriosis, type=c("p", "r")) summary(aov(alpha.factor~beta.factor*disease*phase, data=endometriosis))data(endometriosis) library(lattice) xyplot(log(alpha.factor)~log(beta.factor)|disease*phase, data=endometriosis, type=c("p", "r")) summary(aov(alpha.factor~beta.factor*disease*phase, data=endometriosis))
This dataset touch one particular aspect from EUNOMIA dataset. The original dataset is much more richer.
data(eunomia)data(eunomia)
data.frame with 2008 obs. and 15 variables
CENTRE13Center in which the patient is hospitalized, factor with 13 levels
SUBJECTPatients ID
GENDER, AGE, NUM.HOSP
Gender, age and number of hospitalizations of given patient
CAT.T1, CAT.T2, CAT.T3
Clients Scale for Assessment of Treatment, short assessment, which measures the impact of COPD on a patients life, measured in times: T1, T2 and T3
BPRS.T1, BPRS.T2, BPRS.T3
Average score for Brief Psychiatric Rating Scale, measured in times: T1, T2 and T3
MANSA.T1, MANSA.T2, MANSA.T3
Scale which measures Quality of Life (Manchester Short Assessment of Quality of Life), measured in times: T1, T2 and T3
ICD10International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10)
Artificial dataset generated to be consistent with Eunomia study (European Evaluation of Coercion in Psychiatry and Harmonisation of Best Clinical Practise)
Artificial dataset generated to be consistent with Joanna R. study.
Eunomia dataset, http://www.eunomia-study.net/
data(eunomia) library(lattice) bwplot(CENTRE13~BPRS.T1, data=eunomia) xyplot(BPRS.T1~MANSA.T1|CENTRE13, data=eunomia, type=c("p","g","smooth"))data(eunomia) library(lattice) bwplot(CENTRE13~BPRS.T1, data=eunomia) xyplot(BPRS.T1~MANSA.T1|CENTRE13, data=eunomia, type=c("p","g","smooth"))
Data from National Institute of Hygiene reports. Each row correspond to one record from NIH institute.
data(flu)data(flu)
data.frame with 6384 obs. and 11 variables
regionRegion for which given report was taken. A factor with 16 levels
inception.noNumber of flu occurences in given region for given report period (one or two weeks)
inception.noNumber of flu occurences in given region for given report period (one or two weeks)
inception.rateNumber of flu occurences normalized to 100k people
inception.no.0-14, inception.no.15+, inception.rate.0-14, inception.rate.15+
Absolute and normalized numbers of flu occurences calculated for age group 0-14 or 15+
dateDate of given report
date.idReport id, there is 38 reports per year
latitude, longitude
Geographical coordinates for region
Dataset used during the third edition of WZUR conference, see http://www.biecek.pl/WZUR3/wzurDane.html for more information.
Reports from National Institute of Public Health - National Institute of Hygiene, see: http://www.pzh.gov.pl
More information: http://www.biecek.pl/WZUR3/wzurDane.html
data(flu) library(ggplot2) subflu = flu[flu$region=="Mazowieckie", ] # linear scale qplot(date, inception.rate,data=subflu, geom="line")+scale_y_sqrt() +theme_bw() # polar coordinates qplot(1 + date.id*12/38, inception.rate,data=subflu, geom="path", xlab="month")+ scale_y_sqrt()+geom_smooth(span=0.1,se=FALSE, size=2,col="red") + coord_polar() +theme_bw()data(flu) library(ggplot2) subflu = flu[flu$region=="Mazowieckie", ] # linear scale qplot(date, inception.rate,data=subflu, geom="line")+scale_y_sqrt() +theme_bw() # polar coordinates qplot(1 + date.id*12/38, inception.rate,data=subflu, geom="path", xlab="month")+ scale_y_sqrt()+geom_smooth(span=0.1,se=FALSE, size=2,col="red") + coord_polar() +theme_bw()
Few parameters gathered for 724 bacterial species.
data(genomes)data(genomes)
data.frame with 724 obs. and 7 variables
organismOrganism name, unique value for every row
groupGroup, a factor with 22 levels
sizeGenome size in Mbp
CGGC content for genome sequence
habitat, temp.group, temperature
Where does this bacteria live?
This dataset is prepared by Pawel M., data are taken from NCBI repository.
See http://www.ncbi.nlm.nih.gov/ for more details
Pawel M. study
data(genomes) library(ggplot2) # is this relation linear ? qplot(size,GC, data=genomes) + theme_bw() # or linear in log scales? qplot(size,GC, data=genomes, log="xy") + theme_bw()data(genomes) library(ggplot2) # is this relation linear ? qplot(size,GC, data=genomes) + theme_bw() # or linear in log scales? qplot(size,GC, data=genomes, log="xy") + theme_bw()
A dataset from ,,A modern approach to regression with R”. Simon J. Sheather 2009 . Paired heights for husbands and wifes.
data(heights)data(heights)
data.frame with 96 obs. and 2 variables
Husband, Wife
Height of husband and wife.
The dataset from ,,A modern approach to regression with R”. Simon J. Sheather 2009
A modern approach to regression with R. Simon J. Sheather 2009
data(heights) plot(Husband~Wife, data=heights, pch=19) abline(lm(Husband~Wife, data=heights), col="red") abline(lm(Husband~Wife-1, data=heights), col="blue")data(heights) plot(Husband~Wife, data=heights, pch=19) abline(lm(Husband~Wife, data=heights), col="red") abline(lm(Husband~Wife-1, data=heights), col="blue")
histpp
histpp(x, xname="", utitle="")histpp(x, xname="", utitle="")
x |
TODO |
xname |
TODO |
utitle |
TODO |
TODO
TODO
Przemyslaw Biecek
TODO
# TODO# TODO
Artificial dataset (subset from real dataset with some random modifications)
data(kidney)data(kidney)
data.frame with 334 obs. and 16 variables
recipient.age, donor.age
Age od donor and recipient
CITCold ischemia time
discrepancy.AB, discrepancy.DR
discrepancies in AB and DR antibodies
therapyscheme of immunosuppression
diabetesdiabetes
bpl.drugsnumber of drugs for blood pressure lowering
MDRD7, MDRD30, MDRD3, MDRD6, MDRD12, MDRD24, MDRD36, MDRD60
MDRD (Modification of Diet in Renal Disease) as a estiamtor of glomerular filtration rate (GFR) from serum creatinine, measured 7, 30 days and 3, 6, 12, 24, 36 and 60 months after kidney transplantation
Example of longitudinal study, note that graft for all patients survives 5 years after kidney transplantation.
Artificial dataset generated to be consistent with Maria M. study
data(kidney) boxplotInTime(kidney[,9:16], colnames(kidney[,9:16]), additional=TRUE)data(kidney) boxplotInTime(kidney[,9:16], colnames(kidney[,9:16]), additional=TRUE)
Functions for log-likelihood displacements for each observation or each level of given factor
recalculateLogLik(model, fixef = fixef(model), vcor = VarCorr(model)) groupDisp(formula, data, var) obsDisp(formula, data, inds=1:nrow(data))recalculateLogLik(model, fixef = fixef(model), vcor = VarCorr(model)) groupDisp(formula, data, var) obsDisp(formula, data, inds=1:nrow(data))
model |
a mixed model of the class mer, |
fixef, vcor
|
model parameters log-likelihood evaluation, if not provided then the estimates extracted from the 'model' parameter will be used |
formula |
a model formula that will be passes to the nlme function |
data |
a data frame |
var |
a name of grouping variable (factor) for which the group log-likelihood displacement will be performed |
inds |
indexes of observations for which observation log-likelihood displacement will be performed |
Likelihood displacement is defined as a difference of likelihoods calculated on full dataset for two models with different sets of parameters. The first model is a model with ML estimates obtained for full dataset, while the second model is a model with ML estimates obtained on dataset without a selected observation or group of observations.
Likelihood displacements are used in model diagnostic.
Note that these functions reestimate coefficients in a set of model may be a time consuming.
The function recalculateLogLik() calculated a log-likelihood for model defined by the object model and model parameters defined in following function arguments.
The functions groupDisp() and obsDisp() calculates how the log-likelihood will decrees if selected groups or selected observations will not be used for parameter estimates. Note that log-likelihood is calculated on full dataset.
Przemyslaw Biecek
data(eunomia) require(lme4) set.seed(1313) eunomias <- eunomia[sample(1:2000,100),] groupDisp(formula = BPRS.T2~ (1|CENTRE13), data=eunomias, var="CENTRE13") obsDisp(formula = BPRS.T2~ (1|CENTRE13), data=eunomias, inds = 1:10) obsDisp(formula = BPRS.T2~ (1|CENTRE13), data=eunomias)data(eunomia) require(lme4) set.seed(1313) eunomias <- eunomia[sample(1:2000,100),] groupDisp(formula = BPRS.T2~ (1|CENTRE13), data=eunomias, var="CENTRE13") obsDisp(formula = BPRS.T2~ (1|CENTRE13), data=eunomias, inds = 1:10) obsDisp(formula = BPRS.T2~ (1|CENTRE13), data=eunomias)
Milk yield data for 10 unrelated cows
data(milk)data(milk)
data.frame with 40 obs. and 2 variables
cowcow id, a factor with 10 levels
milk.amountmilk amount in kgs per week
Weekly milk yield amount for 10 cows. For every cow 5 measurements are taken.
data(milk) library(lattice) # change the order of levels milk$cow = reorder(milk$cow, milk$milk.amount, mean) #plot it dotplot(cow~milk.amount, data=milk)data(milk) library(lattice) # change the order of levels milk$cow = reorder(milk$cow, milk$milk.amount, mean) #plot it dotplot(cow~milk.amount, data=milk)
It is known that BTN3A1 (Butyrophilin subfamily 3 member A1) has a crucial function in the secretion of lipids into milk. Doeas the SNP mutation in it change the average milk yield?
data(milkgene)data(milkgene)
data.frame with 1000 obs. and 5 variables
cow.idcow id, there is 465 cows in this study
btn3a1btn3a1 genotype, a factor with two levels
lactationfor some cows there are milk yileds for four lactations for other only for the first one
milk, fat
milk and fat amount in kgs per lactation
Milk and fat yields for 465 cows. For every cow also the genotype of btn3a1 is measured.
Artificial dataset generated to be consistent with Joanna Sz. study
data(milkgene) library(lattice) xyplot(milk~fat, data=milkgene) bwplot(milk~lactation, data=milkgene)data(milkgene) library(lattice) xyplot(milk~fat, data=milkgene) bwplot(milk~lactation, data=milkgene)
Dataset from the book ,,Modele liniowe i mieszane w R, wraz z przykladami w analizie danych”.
data(musculus)data(musculus)
data.frame with 30 obs. and 10 variables
idan individual id
dadidid of father, 0 for founders
momidid of mother, 0 for founders
sexsex
sigmamaximal stress
dietdiet, D1 or D2
k1resilience coefficient in point 1
k2resilience coefficient in point 2
E1Younga module in point 1
E2Younga module in point 2
Dataset from the book ,,Modele liniowe i mieszane w R, wraz z przykladami w analizie danych”.
Used as an example of model with mixed effects where random effects have know dependency structure, here related to the kinship coefficient.
## Not run: require(kinship2) pedmus <- pedigree(musculus$id, musculus$dadid, musculus$momid, musculus$sex) plot(pedmus, affected=musculus$diet) fam <- makefamid(musculus$id, musculus$dadid, musculus$momid) kmatrix <- makekinship(fam, musculus$id, musculus$dadid, musculus$momid) kmatrix[1:5,1:15] ## End(Not run)## Not run: require(kinship2) pedmus <- pedigree(musculus$id, musculus$dadid, musculus$momid, musculus$sex) plot(pedmus, affected=musculus$diet) fam <- makefamid(musculus$id, musculus$dadid, musculus$momid) kmatrix <- makekinship(fam, musculus$id, musculus$dadid, musculus$momid) kmatrix[1:5,1:15] ## End(Not run)
Plot sets of groups in which means of medians are not significantly different.
On the veritical axis the means are marked. Then in a greedy fashion means that are not significantly different are linked by a line.
plotPairwiseTests(p.vals, means, alpha=0.05, digits=3, mar=c(2,10,3,1), ...)plotPairwiseTests(p.vals, means, alpha=0.05, digits=3, mar=c(2,10,3,1), ...)
p.vals |
A slot |
means |
A vector of means or medians corresponding to p.vals object (the order of groups should be the same in both objects) |
alpha |
A threshold for p.value |
digits |
Number of significant digits to be ploted with means. |
mar |
Figure margins, left margin should be large enought to handle names of groups |
... |
These arguments are passed to the plot function. |
Przemyslaw Biecek
data(iris) tmp1 <- pairwise.wilcox.test(iris$Sepal.Width, iris$Species) tmp2 <- tapply(iris$Sepal.Width, iris$Species, median, na.rm=TRUE) plotPairwiseTests(tmp1$p.value, tmp2, alpha=0.001)data(iris) tmp1 <- pairwise.wilcox.test(iris$Sepal.Width, iris$Species) tmp2 <- tapply(iris$Sepal.Width, iris$Species, median, na.rm=TRUE) plotPairwiseTests(tmp1$p.value, tmp2, alpha=0.001)
Dataset with genotypes and phenotypes for 98 patients with schizophrenia disorder.
data(schizophrenia)data(schizophrenia)
data.frame with 98 obs. and 9 variables
NfkB, CD28, IFN
Genotypes for SNP mutations in selected three genes
Dikeos.manic, Dikeos.reality.distortion, Dikeos.depression, Dikeos.disorganization, Dikeos.negative
Dikeos scores for schizophrenia measured in five domains
Dikeos.sumSum of Dikeos scores
Alleles for two SNPs in genes: Nuclear Factor-Kappa Beta (NfkB) and Cluster of Differentiation 28 (CD28) were examined as well as mental health described by five scales (see Dikeos 2008 for more details).
Artificial dataset generated to be consistent with Dorota F. study
data(schizophrenia) attach(schizophrenia) interaction.plot(CD28, NfkB, Dikeos.sum) interaction.plot(NfkB, CD28, Dikeos.sum) model.tables(aov(Dikeos.sum~NfkB*CD28))data(schizophrenia) attach(schizophrenia) interaction.plot(CD28, NfkB, Dikeos.sum) interaction.plot(NfkB, CD28, Dikeos.sum) model.tables(aov(Dikeos.sum~NfkB*CD28))
Calculation of risk SCORE for use in the clinical management of cardiovascular risk in European.
calculateScoreEur(age, cholesterol, SBP, currentSmoker, gender = "Men", risk = "Low risk")calculateScoreEur(age, cholesterol, SBP, currentSmoker, gender = "Men", risk = "Low risk")
age |
age in years |
cholesterol |
in mmol/L |
SBP |
Systolic blood pressure in mmHg |
currentSmoker |
the current smoker status, 1 for current smokers, 0 for non smokers |
gender |
"Men" or "Women" |
risk |
is it "Low risk" or "High risk" group |
Calculation of SCORE based on the paper
,,Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project” R.M. Conroy et. al. Eur Heart J (2003) 24 (11): 987-1003. doi: 10.1016/S0195-668X(03)00114-3
Przemyslaw Biecek
Changes in word usage in consecutive Sejm and Senate cadencies
data(SejmSenat)data(SejmSenat)
contingency matrix with 973 27 rows and 8 columns
Sejm.I, Sejm.II, Sejm.III, Sejm.IV, summary of records from four Sejm cadencies
Senat.II, Senat.III, Senat.IV, Senat.V, summary of records from four Senate cadencies
adj, adja, adjp, adv, aglt, bedzie,conj, depr, fin, ger, ign, imps, impt, inf, interp,num, pact, pant, pcon, ppas, praet, pred, prep, qub, siebie,subst, winien
word modes
Word usage statistics generated from Sejm and Senat records
The IPI PAN Corpus webpage http://korpus.pl/
data(SejmSenat) library(ca) # can you see some patterns? plot(ca(SejmSenat[-15,]), mass =c(TRUE,TRUE), arrows =c(FALSE,TRUE))data(SejmSenat) library(ca) # can you see some patterns? plot(ca(SejmSenat[-15,]), mass =c(TRUE,TRUE), arrows =c(FALSE,TRUE))
What is the minimal dose that is effective?
data(vaccination)data(vaccination)
data.frame with 100 obs. and 2 variables
responsea reaction effect
dosea dose that was applied
Responses for different doses of treatment.
Artificial dataset generated to be consistent with Karolina P. study
data(vaccination) library(lattice) bwplot(response~dose, data=vaccination)data(vaccination) library(lattice) bwplot(response~dose, data=vaccination)
Artificial dataset, shows inconsistency for test type I and III
data(YXZ)data(YXZ)
data.frame with 100 obs. and 3 variables
X, Z
explanatory variables
Yresponse variable
See the example, results for staistical tests are inconsistet due to correlation between X and Z variables
Artificial dataset, generated by PBI
attach(YXZ) summary(lm(Y~X+Z)) anova(lm(Y~Z+X)) anova(lm(Y~X)) anova(lm(Y~Z))attach(YXZ) summary(lm(Y~X+Z)) anova(lm(Y~Z+X)) anova(lm(Y~X)) anova(lm(Y~Z))