Title: | Basic Pattern Analysis |
---|---|
Description: | Run basic pattern analyses on character sets, digits, or combined input containing both characters and numeric digits. Useful for data cleaning and for identifying columns containing multiple or nonstandard formats. |
Authors: | Brandon Greenwell [aut, cre] |
Maintainer: | Brandon Greenwell <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1.9000 |
Built: | 2025-02-04 02:37:35 UTC |
Source: | https://github.com/bgreenwell/bpa |
Perform a basic pattern analysis
get_pattern(x, show_ws = TRUE, ws_char = "w") basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) ## Default S3 method: basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) ## S3 method for class 'data.frame' basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) bpa(x, ...)
get_pattern(x, show_ws = TRUE, ws_char = "w") basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) ## Default S3 method: basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) ## S3 method for class 'data.frame' basic_pattern_analysis(x, unique_only = FALSE, show_ws = TRUE, ws_char = "w", useNA = c("no", "ifany", "always"), ...) bpa(x, ...)
x |
A data frame or character vector. |
show_ws |
Logical indicating whether or not to show whitespace
using a special character. Default is |
ws_char |
Character string to use to depict whitespace when
|
unique_only |
Logical indicating whether or not to only show the unique
patterns. Default is |
useNA |
Logical indicating whether to include |
... |
Additional optional arguments to be passed onto |
basic_pattern_analysis(iris) basic_pattern_analysis(iris, unique_only = TRUE)
basic_pattern_analysis(iris) basic_pattern_analysis(iris, unique_only = TRUE)
Extract values from a vector that match a particular pattern.
match_pattern(x, pattern, unique_only = FALSE, ...)
match_pattern(x, pattern, unique_only = FALSE, ...)
x |
A vector, typically of class |
pattern |
Character string specifying the particular pattern to match. |
unique_only |
Logical indicating whether or not to only return unique
values. Default is |
... |
Additional optional arguments to ba passed onto
|
The pattern specified by the required argument pattern
must be a valid
pattern produced by the get_pattern
function. That is, all digits
should be represented by a "9"
, lowercase/uppercase letters by a
"a"
/"A"
, etc.
phone <- c("123-456-7890", "456-7890", "123-4567", "456-7890") match_pattern(phone, pattern = "999-9999") match_pattern(phone, pattern = "999-9999", unique_only = TRUE)
phone <- c("123-456-7890", "456-7890", "123-4567", "456-7890") match_pattern(phone, pattern = "999-9999") match_pattern(phone, pattern = "999-9999", unique_only = TRUE)
Simulated (messy) data set to help illustrate some of the uses of basic pattern analysis.
A data frame with 1000 rows and 3 variables
Gender
Gender in various formats.
Date
Dates in various formats.
Phone Phone numbers in various formats.
data(messy) bpa(messy, unique_only = TRUE, ws_char = " ")
data(messy) bpa(messy, unique_only = TRUE, ws_char = " ")
Remove leading and/or trailing whitespace from character strings.
trim_ws(x, which = c("both", "left", "right"))
trim_ws(x, which = c("both", "left", "right"))
x |
A data frame or vector. |
which |
A character string specifying whether to remove both leading and
trailing whitespace (default), or only leading ( |
# Toy example d <- data.frame(x = c(" a ", "b ", "c"), y = c(" 1 ", "2", " 3"), z = c(4, 5, 6)) print(d) # print data as is trim_ws(d) # print data with whitespace trimmed off sapply(trim_ws(d), class) # check that column types are preserved
# Toy example d <- data.frame(x = c(" a ", "b ", "c"), y = c(" 1 ", "2", " 3"), z = c(4, 5, 6)) print(d) # print data as is trim_ws(d) # print data with whitespace trimmed off sapply(trim_ws(d), class) # check that column types are preserved