YARNS is under construction. Some features may work, but documentation is incomplete and there are several performance and behavioral quirks. Feel free to use it, but keep in mind that just about everything is considered "unstable".
YARNS searches a dictionary to find words that match the criteria you specify, sorted by their popularity according to the Google Web Trillion Word Corpus.
The corpus YARNS uses is a heavily-processed gestalt consisting of enable1 and Hunspell en_US.
The search uses an extended regular expression language to match words. The extension supports anagram queries along with filtering on modifications of found words. See the examples for clarity.
| Description | Query | Matches | Does not match |
|---|---|---|---|
| Match arbitrary character | bl.. |
blue | clue |
| Regex globs | .*ue |
blue, clue | gruel |
| Regex sets and repetitions | [bc]l[eu]{2} |
blue, clue | glue |
| Anagrams | <lebu> |
blue, lube | clue, bluest |
| Subanagrams | <lebus-> |
be, blue, lube | blues |
| Superanagrams | <lebu+> |
blues | blue |
| Transdelete | <buelst-2> |
blue, lube | blues |
| Transadd | <buel+2> |
bluest | blue |
| Transswap | <roux~2> |
torn | pour |
| Combined | <lb><eu+> |
blue,bluest | lube |
One of the fancier features of regex is look-around. YARNS supports
this in some capacity, but can have poor performance in some
complicated cases. YARNS adds start/end characters to your query, so
that looking for a given word (e.g. "blue") only gives that word, and
not words containing it (blues). However, for look-around expressions,
these are not added. You may want to add ^ or
$ symbols as needed to mark that the look-around is
anchored on either end of word.
| Description | Query | Matches | Does not match |
|---|---|---|---|
| "And"-style matching | (?=[armenia]{4,10}$).*m.*[a] |
armenia, anemia | arena, plasma |
Sometimes you want to search for words based on multiple criteria. Filters are added after a semicolon. A word passes a filter if the filter matches some word. Regex capture groups are replaced in each filter so you can filter based on parts of the matched word.
You can create very expensive queries that time out if you aren't
careful! As a rule of thumb, put the strongest queries/filters first
so that YARNS can rule out words more quickly. Avoid queries like
.*;[some]+\0[regex]+ whenever possible! If you can get
away with a simple filter like .*;anti\0 or
.*;<extra\0>, you may be fine because YARNS can
process these in constant time, but if you do run into issues,
consider reducing the number of requested results to avoid timing out.
| Description | Query | Matches | Does not match |
|---|---|---|---|
| Filter based on entire match (slow!) | .*;anti\0 |
(body, antibody) | clue |
| More efficient way of doing the above | anti(.*);\1 |
(antibody, body) | clue |
| Multiple filters | pro(.*);con\1;\1 |
(product, conduct, duct) | produce |