-
Notifications
You must be signed in to change notification settings - Fork 11.1k
StringsExplained
Joining together a sequence of strings with a separator can be unnecessarily
tricky -- but it shouldn't be. If your sequence contains nulls, it can be even
harder. The fluent style of Joiner makes it simple.
Joiner joiner = Joiner.on("; ").skipNulls();
return joiner.join("Harry", null, "Ron", "Hermione");returns the string "Harry; Ron; Hermione". Alternately, instead of using
skipNulls, you may specify a string to use instead of null with
useForNull(String).
You may also use Joiner on objects, which will be converted using their
toString() and then joined.
Joiner.on(",").join(Arrays.asList(1, 5, 7)); // returns "1,5,7"Warning: joiner instances are always immutable. The joiner configuration
methods will always return a new Joiner, which you must use to get the desired
semantics. This makes any Joiner thread safe, and usable as a static final
constant.
The built in Java utilities for splitting strings have some quirky behaviors.
For example, String.split silently discards trailing separators, and
StringTokenizer respects exactly five whitespace characters and nothing else.
Quiz: What does ",a,,b,".split(",") return?
"", "a", "", "b", ""null, "a", null, "b", null"a", null, "b""a", "b"- None of the above
The correct answer is none of the above: "", "a", "", "b". Only trailing empty
strings are skipped. What is this I don't even.
Splitter allows complete control over all this confusing behavior using a
reassuringly straightforward fluent pattern.
Splitter.on(',')
.trimResults()
.omitEmptyStrings()
.split("foo,bar,, qux");returns an Iterable<String> containing "foo", "bar", "qux". A Splitter may
be set to split on any Pattern, char, String, or CharMatcher.
| Method | Description | Example |
|---|---|---|
Splitter.on(char) |
Split on occurrences of a specific, individual character. | Splitter.on(';') |
Splitter.on(CharMatcher) |
Split on occurrences of any character in some category. |
Splitter.on(CharMatcher.BREAKING_WHITESPACE)Splitter.on(CharMatcher.anyOf(";,."))
|
Splitter.on(String) |
Split on a literal String. |
Splitter.on(", ") |
Splitter.on(Pattern)Splitter.onPattern(String)
|
Split on a regular expression. | Splitter.onPattern("\r?\n") |
Splitter.fixedLength(int) |
Splits strings into substrings of the specified fixed length. The last piece can be smaller than length, but will never be empty. |
Splitter.fixedLength(3) |
| Method | Description | Example |
|---|---|---|
omitEmptyStrings() |
Automatically omits empty strings from the result. |
Splitter.on(',').omitEmptyStrings().split("a,,c,d") returns "a", "c", "d"
|
trimResults() |
Trims whitespace from the results; equivalent to trimResults(CharMatcher.WHITESPACE). |
Splitter.on(',').trimResults().split("a, b, c, d") returns "a", "b", "c", "d"
|
trimResults(CharMatcher) |
Trims characters matching the specified CharMatcher from results. |
Splitter.on(',').trimResults(CharMatcher.is('_')).split("_a ,_b_ ,c__") returns "a ", "b_ ", "c". |
limit(int) |
Stops splitting after the specified number of strings have been returned. |
Splitter.on(',').limit(3).split("a,b,c,d") returns "a", "b", "c,d"
|
If you wish to get a List, use
splitToList() instead of split().
Warning: splitter instances are always immutable. The splitter configuration
methods will always return a new Splitter, which you must use to get the
desired semantics. This makes any Splitter thread safe, and usable as a
static final constant.
You can also use a splitter to deserialize a map by specifying a second
delimiter using withKeyValueSeparator().
The resulting MapSplitter will split the input into entries using the
splitter's delimiter, and then split those entries into keys and values using
the given key-value separator, returning a Map<String, String>.
In olden times, our StringUtil class grew unchecked, and had many methods like
these:
allAsciicollapsecollapseControlCharscollapseWhitespacelastIndexNotOfnumSharedCharsremoveCharsremoveCrLfretainAllCharsstripstripAndCollapsestripNonDigits
They represent a partial cross product of two notions:
- what constitutes a "matching" character?
- what to do with those "matching" characters?
To simplify this morass, we developed CharMatcher.
Intuitively, you can think of a CharMatcher as representing a particular class
of characters, like digits or whitespace. Practically speaking, a CharMatcher
is just a boolean predicate on characters -- indeed, CharMatcher implements
[Predicate<Character>] -- but because it
is so common to refer to "all whitespace characters" or "all lowercase letters,"
Guava provides this specialized syntax and API for characters.
But the utility of a CharMatcher is in the operations it lets you perform on
occurrences of the specified class of characters: trimming, collapsing,
removing, retaining, and much more. An object of type CharMatcher represents
notion 1: what constitutes a matching character? It then provides many
operations answering notion 2: what to do with those matching characters? The
result is that API complexity increases linearly for quadratically increasing
flexibility and power. Yay!
String noControl = CharMatcher.javaIsoControl().removeFrom(string); // remove control characters
String theDigits = CharMatcher.digit().retainFrom(string); // only the digits
String spaced = CharMatcher.whitespace().trimAndCollapseFrom(string, ' ');
// trim whitespace at ends, and replace/collapse whitespace into single spaces
String noDigits = CharMatcher.javaDigit().replaceFrom(string, "*"); // star out all digits
String lowerAndDigit = CharMatcher.javaDigit().or(CharMatcher.javaLowerCase()).retainFrom(string);
// eliminate all characters that aren't digits or lowercaseNote: CharMatcher deals only with char values; it does not understand
supplementary Unicode code points in the range 0x10000 to 0x10FFFF. Such logical
characters are encoded into a String using surrogate pairs, and a
CharMatcher treats these just as two separate characters.
Many needs can be satisfied by the provided CharMatcher factory methods:
any()none()whitespace()breakingWhitespace()invisible()digit()javaLetter()javaDigit()javaLetterOrDigit()javaIsoControl()javaLowerCase()javaUpperCase()ascii()singleWidth()
Other common ways to obtain a CharMatcher include:
| Method | Description |
|---|---|
anyOf(CharSequence) |
Specify all the characters you wish matched. For example, CharMatcher.anyOf("aeiou") matches lowercase English vowels. |
is(char) |
Specify exactly one character to match. |
inRange(char, char) |
Specify a range of characters to match, e.g. CharMatcher.inRange('a', 'z'). |
Additionally, CharMatcher has negate(), and(CharMatcher), and
or(CharMatcher). These provide simple boolean operations on CharMatcher.
CharMatcher provides a wide variety of methods to operate on occurrences of
the specified characters in any CharSequence. There are more methods provided
than we can list here, but some of the most commonly used are:
| Method | Description |
|---|---|
collapseFrom(CharSequence, char) |
Replace each group of consecutive matched characters with the specified character. For example, WHITESPACE.collapseFrom(string, ' ') collapses whitespaces down to a single space. |
matchesAllOf(CharSequence) |
Test if this matcher matches all characters in the sequence. For example, ASCII.matchesAllOf(string) tests if all characters in the string are ASCII. |
removeFrom(CharSequence) |
Removes matching characters from the sequence. |
retainFrom(CharSequence) |
Removes all non-matching characters from the sequence. |
trimFrom(CharSequence) |
Removes leading and trailing matching characters. |
replaceFrom(CharSequence, CharSequence) |
Replace matching characters with a given sequence. |
(Note: all of these methods return a String, except for matchesAllOf, which
returns a boolean.)
Charsets is now obsolete in favor of the constants in StandardCharsets.
They enable you to change this code...
try {
bytes = string.getBytes("UTF-8");
} catch (UnsupportedEncodingException e) {
// how can this possibly happen?
throw new AssertionError(e);
}...to this...
bytes = string.getBytes(UTF_8);CaseFormat is a handy little class for converting between ASCII case
conventions — like, for example, naming conventions for programming
languages. Supported formats include:
| Format | Example |
|---|---|
LOWER_CAMEL |
lowerCamel |
LOWER_HYPHEN |
lower-hyphen |
LOWER_UNDERSCORE |
lower_underscore |
UPPER_CAMEL |
UpperCamel |
UPPER_UNDERSCORE |
UPPER_UNDERSCORE |
Using it is relatively straightforward:
CaseFormat.UPPER_UNDERSCORE.to(CaseFormat.LOWER_CAMEL, "CONSTANT_NAME")); // returns "constantName"We find this especially useful, for example, when writing programs that generate other programs.
A limited number of general-purpose String utilities reside in the Strings
class.
- Introduction
- Basic Utilities
- Collections
- Graphs
- Caches
- Functional Idioms
- Concurrency
- Strings
- Networking
- Primitives
- Ranges
- I/O
- Hashing
- EventBus
- Math
- Reflection
- Releases
- Tips
- Glossary
- Mailing List
- Stack Overflow
- Android Overview
- Footprint of JDK/Guava data structures