The archivist package is very efficient and advantageous when archived artifacts were created with a chaining code offered by the magrittr package. It is highly useful because the origin of the artifact is archived, which means that the artifact can be easily reproduced and it’s origin code is stored for future use.
Below are examples of creating artifacts with a chaining code, that
requires using a %>%
and a %.%
operators,
offered by the magrittr and the dplyr
package.
Since the version 1.5 of the magrittr package has
changed functionality of a mentioned pipe operator %>%
,
we copied (in version 1.3 of the archivist)
functionality from version 1.0.1 and added old operator to the
archivist package as %a%
operator.
Let us prepare a Repository where archived artifacts will be stored.
Then one might create artifacts like those below. The code lines are
ordered in chaining code, which will be used by the asave
function to store an artifact and archive it’s origin code as a
name
of this artifact.
# example 1
library(dplyr)
data("hflights", package = "hflights")
hflights %a%
group_by(Year, Month, DayofMonth) %a%
select(Year:DayofMonth, ArrDelay, DepDelay) %a%
summarise(
arr = mean(ArrDelay, na.rm = TRUE),
dep = mean(DepDelay, na.rm = TRUE)
) %a%
filter(arr > 30 | dep > 30) -> example1
One may see a vast difference in code evaluation when using chaining
code. Here is an example of a traditional R
call and one
that uses the chaining code
philosophy.
# example 2
library(Lahman)
# Traditional R code
players <- group_by(Batting, playerID)
games <- summarise(players, total = sum(G))
head(arrange(games, desc(total)), 5)
Source: local data frame [5 x 2]
playerID total
(chr) (int)
1 rosepe01 3562
2 yastrca01 3308
3 aaronha01 3298
4 henderi01 3081
5 cobbty01 3035
setLocalRepo
To simplify the code one can set globally the path to
Repository using code as below. Now one no longer need
to specify the repoDir
parameter with every call.
Many of various operations can be performed on a single
data.frame
before one considers to archive these artifacts.
Archivist guarantees that all of them will be
archived
, which means a code alone will no longer be needed
to be stored in a separate file. Also an artifact may be saved during
operations are performed and used in further code evaluations. This can
be done when argument in asave
is specified.
# example 3
aread('MarcinKosinski/Museum/3374db20ecaf2fa0d070d') -> crime.by.state
crime.by.state %a%
filter(State=="New York", Year==2005) %a%
arrange(desc(Count)) %a%
select(Type.of.Crime, Count) %a%
mutate(Proportion=Count/sum(Count)) %a%
asave( exampleRepoDir, value = TRUE) %a%
group_by(Type.of.Crime) %a%
summarise(num.types = n(), counts = sum(Count)) %a%
asave( )
Dozens of artifacts may now be stored in one Repository. Every artifact may have an additional Tag specified by an user. This will simplify searching for this artifact in the future.
# results = 'asis'
hash2 <- searchInLocalRepo( pattern = "operations on diamonds" )
ahistory(hash2, format = "kable")
call | md5hash | |
---|---|---|
5 | env[[nm]] | 926dab1fe6e71b197a17909fcd0e5995 |
4 | group_by(cut, clarity, color) | 860466a792815080957a34021d04c5c6 |
3 | summarize(meancarat = mean(carat, na.rm = TRUE), ndiamonds = length(carat)) | 820c5bf2ce98bbb4b787830fe52d98f3 |
2 | head(10) | 33576ab6aa88e5aedb4887aeffd84fa9 |
1 | asave(userTags = c(“tags”, “operations on diamonds”)) | 434d4891ac1569883f80b2ec9fef0b95 |