search syntax: Lancaster BNC interface

Wildcards

  • ? for a single arbitrary character: s?n → sin, son, sun, …
  • * for zero or more characters: *able → able, capable, table, …
  • + for one or more characters: +able → table, capable, … but not able
  • ??+ for three or more characters, etc. :??+able → capable, … but not table, able
  • can be used in isolation to replace words
  • list comma-separated alternatives (optionally including wildcards) in square brackets: neighbo[u,]r → neighbour, neighbor; ??+[able,ible] → capable, possible

PoS tags

  • can be used as tags or in isolation
  • both require an underscore _
  • full list available here

Lemma search

  • curly brackets: {swim} → swim, swims, swimming, swam, swum…
  • specify PoS, e.g. {house/V} → house, houses, housed (tagged as verb)

Proximity queries

  • start <<s>> business→ start and business in the same sentence (can use POS/lemma constraints)
  • morning <<3>> evening → morning and evening within 3 tokens
  • multiple constraints can be chained: {day} <<5>> {morning} <<5>> {evening}  → day must co-occur with morning and evening in 5 tokens
  • can be nested with parentheses: {waste/V} <<s>> (breath <<3>> here)  → waste must co-occur with breath and here in same sentence; but breath and here within 3 tokens
  • protect wildcards and other metacharacters with backslash \ to match the literal character: \? → ? but ? → a, b, c, …, A, B, C, …, 1, 2, …, ., !, ?, …

Miscellaneous

  • searches are case-insensitive by default → set the “Query mode” drop-down menu to “Simple query (case-sensitive)”
  • use :d modifier to ignore accents: fiancee:d → fiancée, fiancee
  • contracted forms and punctuation need to be separated by a space

Source: https://corpora.linguistik.uni-erlangen.de/newsscape/doc/cqpweb-simple-syntax-help.pdf