stri_duplicated {stringi} | R Documentation |
stri_duplicated()
determines which strings in a character vector
are duplicates of other elements.
stri_duplicated_any()
determines if there are any duplicated
strings in a character vector.
stri_duplicated(str, fromLast = FALSE, ..., opts_collator = NULL) stri_duplicated_any(str, fromLast = FALSE, ..., opts_collator = NULL)
str |
a character vector |
fromLast |
a single logical value; indicating whether duplication should be considered from the reverse side |
... |
additional settings for |
opts_collator |
a named list with ICU Collator's options
as generated with |
Missing values are regarded as equal.
Unlike duplicated
and anyDuplicated
,
these functions test for canonical equivalence of strings
(and not whether the strings are just bytewise equal)
Such operations are locale-dependent.
Hence, stri_duplicated
and stri_duplicated_any
are significantly slower (but much better suited for natural language
processing) than their base R counterpart.
See also stri_unique
for extracting unique elements.
stri_duplicated()
returns a logical vector of the same length
as str
. Each of its elements indicates whether a canonically
equivalent string was already found in str
.
stri_duplicated_any()
returns a single non-negative integer.
Value of 0 indicates that all the elements in str
are unique.
Otherwise, it gives the index of the first non-unique element.
Collation - ICU User Guide, http://userguide.icu-project.org/collation
Other locale_sensitive: %s!==%
,
%s!=%
, %s<=%
,
%s<%
, %s===%
,
%s==%
, %s>=%
,
%s>%
, %stri!==%
,
%stri!=%
, %stri<=%
,
%stri<%
, %stri===%
,
%stri==%
, %stri>=%
,
%stri>%
; stri_cmp
,
stri_cmp_eq
, stri_cmp_equiv
,
stri_cmp_ge
, stri_cmp_gt
,
stri_cmp_le
, stri_cmp_lt
,
stri_cmp_neq
,
stri_cmp_nequiv
,
stri_compare
;
stri_count_boundaries
,
stri_count_words
;
stri_enc_detect2
;
stri_extract_all_boundaries
,
stri_extract_all_words
,
stri_extract_first_boundaries
,
stri_extract_first_words
,
stri_extract_last_boundaries
,
stri_extract_last_words
;
stri_locate_all_boundaries
,
stri_locate_all_words
,
stri_locate_first_boundaries
,
stri_locate_first_words
,
stri_locate_last_boundaries
,
stri_locate_last_words
;
stri_opts_collator
;
stri_order
, stri_sort
;
stri_split_boundaries
;
stri_trans_tolower
,
stri_trans_totitle
,
stri_trans_toupper
;
stri_unique
; stri_wrap
;
stringi-locale
;
stringi-search-boundaries
;
stringi-search-coll
Other locale_sensitive: %s!==%
,
%s!=%
, %s<=%
,
%s<%
, %s===%
,
%s==%
, %s>=%
,
%s>%
, %stri!==%
,
%stri!=%
, %stri<=%
,
%stri<%
, %stri===%
,
%stri==%
, %stri>=%
,
%stri>%
; stri_cmp
,
stri_cmp_eq
, stri_cmp_equiv
,
stri_cmp_ge
, stri_cmp_gt
,
stri_cmp_le
, stri_cmp_lt
,
stri_cmp_neq
,
stri_cmp_nequiv
,
stri_compare
;
stri_count_boundaries
,
stri_count_words
;
stri_enc_detect2
;
stri_extract_all_boundaries
,
stri_extract_all_words
,
stri_extract_first_boundaries
,
stri_extract_first_words
,
stri_extract_last_boundaries
,
stri_extract_last_words
;
stri_locate_all_boundaries
,
stri_locate_all_words
,
stri_locate_first_boundaries
,
stri_locate_first_words
,
stri_locate_last_boundaries
,
stri_locate_last_words
;
stri_opts_collator
;
stri_order
, stri_sort
;
stri_split_boundaries
;
stri_trans_tolower
,
stri_trans_totitle
,
stri_trans_toupper
;
stri_unique
; stri_wrap
;
stringi-locale
;
stringi-search-boundaries
;
stringi-search-coll
# In the following examples, we have 3 duplicated values, # "a" - 2 times, NA - 1 time stri_duplicated(c("a", "b", "a", NA, "a", NA)) stri_duplicated(c("a", "b", "a", NA, "a", NA), fromLast=TRUE) stri_duplicated_any(c("a", "b", "a", NA, "a", NA)) # compare the results: stri_duplicated(c("\u0105", stri_trans_nfkd("\u0105"))) duplicated(c("\u0105", stri_trans_nfkd("\u0105"))) stri_duplicated(c("gro\u00df", "GROSS", "Gro\u00df", "Gross"), strength=1) duplicated(c("gro\u00df", "GROSS", "Gro\u00df", "Gross"))