regular expressions in WebLicht

Regex are put inside forward slashes / /for example: [word=/lo*l*/]   ß finds all strings with an l followed by one or more os followed by one or more ls (e.g., lol, looool, loooooolllll)

Regex must match the entire string, e.g. finding lemma that contain “ent” anywhere in the string: [lemma=/.*ent.*/]

Regex can be used or combined with the other search attributes: token/word, lemma (lemma or canonical form), pos (part of speech), cat (category label for constituents)

Regex Operators in Weblicht

.   matches any character, e.g. [word=/se./] → see, sea, set

*   matches preceding element 0 or more times, e.g. [word=/.*man/] → man, woman, gentleman, Frenchman, German 

+   matches preceding element at least once, e.g. [word=/.+man/] → woman, gentleman, Frenchman, German (but not: man)

?   makes preceding element optional, e.g. [word=favou?r] → favour, favor

[ ]   specifies a set of alternatives, e.g. [word=/19[56789]./] → anything from 1950 to 1999

|   specifies alternatives, e.g. [lemma=/house|home/] → house or home as lemmas

^   excludes element, e.g. [word=/sec[^u].*/] → secret, second (but not: secure or other words that start with secu)

( )   grouping, e.g.  [word=/([mn][ae])+/] → m or n followed by a or e, at least   once → name, me

\   escape for reserved characters ( ) [ ] { } . + $ * ? | / \ → i.e. when using them literally, not as their regex functions