Title: | TJ's Miscellany |
---|---|
Description: | A collection of helper functions. |
Authors: | Tristan Mahr [aut, cre] |
Maintainer: | Tristan Mahr <[email protected]> |
License: | GPL-3 |
Version: | 0.0.0.9000 |
Built: | 2024-10-29 03:16:48 UTC |
Source: | https://github.com/tjmahr/tjmisc |
Annotating plots with a grey background
annotate_label_grey( label, x, y, size = 4, fill = "#EBEBEB99", hjust = 0, vjust = 0, label.size = 0, ... )
annotate_label_grey( label, x, y, size = 4, fill = "#EBEBEB99", hjust = 0, vjust = 0, label.size = 0, ... )
label |
Text to write on the plot. |
x , y
|
x and y positions. |
size , fill , hjust , vjust , label.size
|
Plotting aesthetics that this function handles. They can be overridden. |
... |
Other parameters to pass onto |
An annotation layer for a ggplot2 plot.
Compare pairs of categorical variables
compare_pairs(data, levels, values, f = `-`)
compare_pairs(data, levels, values, f = `-`)
data |
a dataframe |
levels |
a column with a categorical variable. All pairs of values in
|
values |
a column with values to compare. |
f |
comparison function to apply to values in each pair. Defaults to |
a dataframe with pairwise comparisons
to_compare <- nlme::Machines %>% dplyr::group_by(Worker) %>% dplyr::summarise(avg_score = mean(score)) %>% print() to_compare %>% compare_pairs(Worker, avg_score) %>% dplyr::rename(difference = value) %>% dplyr::mutate_if(is.numeric, round, 1)
to_compare <- nlme::Machines %>% dplyr::group_by(Worker) %>% dplyr::summarise(avg_score = mean(score)) %>% print() to_compare %>% compare_pairs(Worker, avg_score) %>% dplyr::rename(difference = value) %>% dplyr::mutate_if(is.numeric, round, 1)
Compare two vectors using R's set operations
compare_sets(x, y)
compare_sets(x, y)
x , y
|
vectors to compare |
a list with lengths
(the lengths of the other elements), x
, y
,
unique(x)
, unique(y)
, setequal(x, y)
, setdiff(x, y)
, setdiff(y, x)
, intersect(x, y)
, union(x, y)
.
yours <- c(1, 2, 3, 4, 4) mine <- c(3, 5, 6, 4) compare_sets(yours, mine)
yours <- c(1, 2, 3, 4, 4) mine <- c(3, 5, 6, 4) compare_sets(yours, mine)
These functions strips away code and non-prose elements before counting words.
count_words_in_rmd_file(path) count_words_in_rmd_lines(lines) simplify_rmd_lines(lines)
count_words_in_rmd_file(path) count_words_in_rmd_lines(lines) simplify_rmd_lines(lines)
path |
path to an Rmarkdown file |
lines |
a character vector of text (from an Rmarkdown file) |
The helper function simplify_rmd_lines()
strips down an Rmarkdown
file so that dubious things do not contribute to the word count. It does
the following.
Remove all lines that fall between a pair of ````
lines. (These are
used sometimes to show verbatim text from blocks with three tick marks).
Remove all lines that fall between a pair of ```
lines.
Lines that end with `r
are merged with the following line.
Inline code spans are replaced with a single word (`code`)
.
Single-line HTML comments are deleted.
These steps are very ad hoc, updated and expanded as I run into new things that need to be excluded from my word counts. Let's not pretend that this thing is at all comprehensive.
The word-count is computed by stringi::stri_stats_latex()
.
a data-frame with the counts of word, characters in words, and
whitespace characters. simplify_rmd_lines()
returns a character vector of
simplified Rmarkdown lines.
Format the labels of a factor
fct_glue_labels(xs, fmt = "{levels}", first_fmt = fmt) fct_add_counts(xs, fmt = "{levels} ({counts})", first_fmt = fmt)
fct_glue_labels(xs, fmt = "{levels}", first_fmt = fmt) fct_add_counts(xs, fmt = "{levels} ({counts})", first_fmt = fmt)
xs |
a factor |
fmt |
glue-style format to use. Defaults to |
first_fmt |
glue-style format to use for very first label. Defaults to
value of |
At this point, only the magic variables "{levels}"
and
"{counts}"
are available ". In principle, others could be defined.
fct_add_counts()
is a special case of fct_glue_labels()
.
a factor with the labels updated
Creates plots of matrices like graphics::matplot()
but uses ggplot2,
defaults to drawing lines, and can specify a column to use for the
x-axis.
ggmatplot(x, x_axis_column = NULL, n_colors = 6, unique_rows = TRUE)
ggmatplot(x, x_axis_column = NULL, n_colors = 6, unique_rows = TRUE)
x |
A matrix. |
x_axis_column |
Index (number) of the column to plot for the x-axis.
Defaults to |
n_colors |
Number of colors to cycle through. Defaults to 6. |
unique_rows |
Whether to work first take the unique rows of the matrix.
Defaults to |
a ggplot2 plot.
ggsave()
This function saves a plot to a temporary file with ggsave()
and opens the
temporary file in the system viewer. This function is useful for quickly
previewing how a plot will look when it is saved to a file.
ggpreview(..., device = "png")
ggpreview(..., device = "png")
... |
options passed onto |
device |
the file extention of the device to use. Defaults to |
Check for locally repeating values
is_same_as_last(xs) replace_if_same_as_last(xs, replacement = "")
is_same_as_last(xs) replace_if_same_as_last(xs, replacement = "")
xs |
a vector |
replacement |
a value used to replace a repeated value. Defaults to
|
is_same_as_last()
returns TRUE when xs[n]
the same as xs[n-1]
.
xs <- c("a", "a", "a", NA, "b", "b", "c", NA, NA) is_same_as_last(xs) replace_if_same_as_last(xs, "")
xs <- c("a", "a", "a", NA, "b", "b", "c", NA, NA) is_same_as_last(xs) replace_if_same_as_last(xs, "")
This is the function I use to create new posts for my website.
jekyll_create_rmd_draft( slug = NULL, date = NULL, dir_drafts = "./_R/_drafts", open = TRUE )
jekyll_create_rmd_draft( slug = NULL, date = NULL, dir_drafts = "./_R/_drafts", open = TRUE )
slug |
A "slug" to use for the post. Should be a string consisting of
|
date |
Date string to use for the post. Default to |
dir_drafts |
Relative path to the folder to store the drafts. Defaults
to |
open |
Whether to open the file for editing when using RStudio. Defaults
to |
The path to the created file is invisibly returned.
Randomly sample data from n sub-groups of data
sample_n_of(data, size, ...)
sample_n_of(data, size, ...)
data |
a dataframe |
size |
number of groups to sample |
... |
variables to group by |
the data from subgroups
sample_data <- tibble::tibble( letter = rep(letters, 5), color = rep(c("red", "green", "yellow", "orange", "blue"), 26), value = rnorm(26 * 5) ) # data from two letters sample_data %>% sample_n_of(2, letter) # data from two colors sample_data %>% sample_n_of(2, color) # data from 10 letter-colors pairs sample_data %>% sample_n_of(10, letter, color)
sample_data <- tibble::tibble( letter = rep(letters, 5), color = rep(c("red", "green", "yellow", "orange", "blue"), 26), value = rnorm(26 * 5) ) # data from two letters sample_data %>% sample_n_of(2, letter) # data from two colors sample_data %>% sample_n_of(2, color) # data from 10 letter-colors pairs sample_data %>% sample_n_of(10, letter, color)
Create a sequence along the rows of a dataframe
seq_along_rows(data)
seq_along_rows(data)
data |
a dataframe |
a sequence of integers along the rows of a dataframe
Which lines fall in between a delimeter pattern
str_which_between(string, pattern)
str_which_between(string, pattern)
string |
a character vector |
pattern |
a regular expression pattern to look for |
the lines that are contained between pairs of delimiter patterns
string <- " ```{r} # some code ``` Here is more code. ```markdown **bold!** ``` " lines <- unlist(strsplit(string, "\n")) str_which_between(lines, "^```")
string <- " ```{r} # some code ``` Here is more code. ```markdown **bold!** ``` " lines <- unlist(strsplit(string, "\n")) str_which_between(lines, "^```")
This function respects groupings from dplyr::group_by()
. When the dataframe
contains grouped data, the correlations are computed within each subgroup of
data.
tidy_correlation(data, ..., type = c("pearson", "spearman"))
tidy_correlation(data, ..., type = c("pearson", "spearman"))
data |
a dataframe |
... |
columns to select, using |
type |
type of correlation, either |
a long dataframe (a tibble) with correlations calculated for each pair of columns.
tidy_correlation(ChickWeight, -Chick, -Diet) tidy_correlation(ChickWeight, weight, Time) ChickWeight %>% dplyr::group_by(Diet) %>% tidy_correlation(weight, Time)
tidy_correlation(ChickWeight, -Chick, -Diet) tidy_correlation(ChickWeight, weight, Time) ChickWeight %>% dplyr::group_by(Diet) %>% tidy_correlation(weight, Time)
This function respects groupings from dplyr::group_by()
. When the dataframe
contains grouped data, the quantiles are computed within each subgroup of
data.
tidy_quantile(data, var, probs = seq(0.1, 0.9, 0.2))
tidy_quantile(data, var, probs = seq(0.1, 0.9, 0.2))
data |
a dataframe |
var |
a column in the dataframe |
probs |
quantiles to return. Defaults to |
a long dataframe (a tibble) with quantiles for the variable.
tidy_quantile(sleep, extra) sleep %>% dplyr::group_by(group) %>% tidy_quantile(extra)
tidy_quantile(sleep, extra) sleep %>% dplyr::group_by(group) %>% tidy_quantile(extra)
Colors I like
tjm_colors
tjm_colors
An object of class list
of length 8.