| Title: | Basic Pattern Analysis |
|---|---|
| Description: | Run basic pattern analyses on character sets, digits, or combined input containing both characters and numeric digits. Useful for data cleaning and for identifying columns containing multiple or nonstandard formats. |
| Authors: | Brandon Greenwell [aut, cre] |
| Maintainer: | Brandon Greenwell <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.1.1.9000 |
| Built: | 2026-05-31 09:40:19 UTC |
| Source: | https://github.com/bgreenwell/bpa |
Perform a basic pattern analysis
get_pattern(x, show_ws = TRUE, ws_char = "w") basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) ## Default S3 method: basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) ## S3 method for class 'data.frame' basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) bpa(x, ...)get_pattern(x, show_ws = TRUE, ws_char = "w") basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) ## Default S3 method: basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) ## S3 method for class 'data.frame' basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) bpa(x, ...)
x |
A data frame or character vector. |
show_ws |
Logical indicating whether or not to show whitespace
using a special character. Default is |
ws_char |
Character string to use to depict whitespace when
|
unique_only |
Logical indicating whether or not to only show the unique
patterns. Default is |
useNA |
Logical indicating whether to include |
... |
Additional optional arguments to be passed onto |
basic_pattern_analysis(iris) basic_pattern_analysis(iris, unique_only = TRUE)basic_pattern_analysis(iris) basic_pattern_analysis(iris, unique_only = TRUE)
Extract values from a vector that match a particular pattern.
match_pattern(x, pattern, unique_only = FALSE, ...)match_pattern(x, pattern, unique_only = FALSE, ...)
x |
A vector, typically of class |
pattern |
Character string specifying the particular pattern to match. |
unique_only |
Logical indicating whether or not to only return unique
values. Default is |
... |
Additional optional arguments to ba passed onto
|
The pattern specified by the required argument pattern must be a valid
pattern produced by the get_pattern function. That is, all digits
should be represented by a "9", lowercase/uppercase letters by a
"a"/"A", etc.
phone <- c("123-456-7890", "456-7890", "123-4567", "456-7890") match_pattern(phone, pattern = "999-9999") match_pattern(phone, pattern = "999-9999", unique_only = TRUE)phone <- c("123-456-7890", "456-7890", "123-4567", "456-7890") match_pattern(phone, pattern = "999-9999") match_pattern(phone, pattern = "999-9999", unique_only = TRUE)
Simulated (messy) data set to help illustrate some of the uses of basic pattern analysis.
A data frame with 1000 rows and 3 variables
Gender Gender in various formats.
Date Dates in various formats.
Phone Phone numbers in various formats.
data(messy) bpa(messy, unique_only = TRUE, ws_char = " ")data(messy) bpa(messy, unique_only = TRUE, ws_char = " ")
Remove leading and/or trailing whitespace from character strings.
trim_ws(x, which = c("both", "left", "right"))trim_ws(x, which = c("both", "left", "right"))
x |
A data frame or vector. |
which |
A character string specifying whether to remove both leading and
trailing whitespace (default), or only leading ( |
# Toy example d <- data.frame(x = c(" a ", "b ", "c"), y = c(" 1 ", "2", " 3"), z = c(4, 5, 6)) print(d) # print data as is trim_ws(d) # print data with whitespace trimmed off sapply(trim_ws(d), class) # check that column types are preserved# Toy example d <- data.frame(x = c(" a ", "b ", "c"), y = c(" 1 ", "2", " 3"), z = c(4, 5, 6)) print(d) # print data as is trim_ws(d) # print data with whitespace trimmed off sapply(trim_ws(d), class) # check that column types are preserved