Sign In Sign Out Subscribe to Mailing Lists Unsubscribe or Change Settings Help

smoe.org mailing lists

                            Introduction to Patterns

Patterns are used by various commands and configuration settings:

  By the archive-sync command, to match archive names.
  By the lists and rekey commands, to match list names.
  By the set-pattern, unregister-pattern, unsubscribe-pattern, which, 
    and who commands, to match e-mail addresses.

  By the access_rules, advertise, bounce_probe_pattern, bounce_rules,
    delivery_rules, noadvertise, and post_limits settings, to match 
    e-mail addresses.
  By the admin_body and taboo_body settings, to match lines in
    the body of a posted message.
  By the admin_headers and taboo_headers settings, to match lines in
    the headers of a posted message.
  By the attachment_filters and attachment_rules settings, to match 
    message content types.
  By the quote_pattern setting, to count the lines in the body of a
    posted message that are marked as being written by someone else.
  By the signature_separator setting, to match the beginning of 
    an e-mail signature.


There are four supported types of pattern, described below:
  Substring Patterns,   like         "example"
  Glob Patterns,        like         %example% 
  Regular Expressions,  like         /example/
  Undelimited Patterns, like          example

Several examples of regular expressions are illustrated:
  Example 1  - a list of special characters
  Example 2  - escaping '.' is required
  Example 3  - escaping '@' is required
  Example 4  - matching the beginning and end of string
  Example 5  - matching anything and everything
  Example 6  - escaping '*' is required
  Example 7  - case sensitivity
  Example 8  - overly safe escaping doesn't hurt
  Example 9  - matching (or NOT matching) white space
  Example 10 - negated or inverted matches

Majordomo is written in the Perl programming language.  Perl regular 
expressions are a powerful but complicated tool for pattern matching.
To eliminate some of the complexity, three simpler forms of pattern 
matching are provided, in addition to full Perl regular expressions.

A pattern is usually enclosed in "delimiters," with optional "modifiers"
outside the delimiters.  The delimiters indicate where the pattern begins 
and ends, and the modifiers change how matches are found.  For example, 
in the pattern:

  "example.net"i

the delimiters are quotes, and the 'i' is a modifier.  The most common
modifier, the letter 'i', makes the matching case-insensitive, meaning 
that small and capital letters are considered identical.

The negation modifier, '!', may be used to invert any of the four
kinds of pattern.  For example,
  !edu
would match any string of characters that does not contain "edu".

The special pattern
  ALL
will match everything.

Substring Patterns
------------------

  Examples: "example.com"
            "user@somewhere.example.com"i

The delimiter is a double quote.  There are no special characters; the
pattern matches if the pattern occurs anywhere within the text to be
matched.  A trailing 'i' specifies that the matching is case-insensitive.
For instance,
  "bsc"          would match        unsubscribe
  "bsc"          would not match    unsuBsCribe
  "bsc"i         would match        unsuBsCribe


Glob Patterns
-------------

  Examples: %user@*example.com%i
            %u-???@*example.com%i

The delimiter is a percent sign.  These patterns are reminiscent of
file-matching patterns from the DOS and Unix command line interfaces.
Special characters include:

  ?    matches any single character
  *    matches any number (including zero) of any character. 
  []   are used to define character classes.  For instance,
       [abc] will match any one of the letters a, b, or c.  This
       style of grouping has the same effect as in regular expressions.


Regular Expressions
-------------------

What follows is a basic discussion of Perl regular expressions. 
There is one important difference between Majordomo regular expressions
and Perl regular expressions: in Perl version 5 and above, the
'@' character should be "escaped" with a backslash, \@.  Majordomo
will compensate if you forget to add the backslash, but for
the sake of correctness you should always include it when you
are trying to match a literal '@' symbol.

Example 1 - a list of special characters

A regular expression is a concise way of expressing a pattern in
a series of characters.  The full power of regular expressions can
make some difficult tasks quite easy, but we will only brush the
surface here.

The character / is used to mark the beginning and end of a regular
expression.  Letters and numbers stand for themselves.  Many of the
other characters are symbolic.  Some commonly used ones are:

  !     negates what follows, matching when the expression does NOT
  \@    the `@' found in nearly all addresses; it must be preceded
        by a backslash to avoid errors.
  .     (period) any character
  *     previous character, zero or more times; note especially...
  .*    any character, zero or more times
  +     previous character, one or more times; so for example...
  a+    letter "a", one or more times
  \     next character stands for itself; so for example...
  \.     literally a period, not meaning "any character"
  ^     beginning of the string; so for example...
  ^a    a string beginning with letter "a"
  $     end of the string; so for example...
  a$    a string ending with letter "a"



Example 2 - escaping '.' is required

     /foo\.example\.com/

Notice that the periods are preceded by a backslash so that they are
interpreted as periods, rather than wildcards.  This matches any string
containing:

     foo.example.com

such as:

     foo.example.com
     bar.foo.example.com
     user@bar.foo.example.com
     users%bar.foo.example.com@example.com


Example 3 - escaping '@' is required

     /johndoe\@.*foo\.example\.com/

The `@' has special meaning to Perl and should be prefixed with a backslash
to avoid errors.  The string ".*" means "any character, zero or more
times".  So this matches:

     johndoe@foo.example.com
     johndoe@terminus.foo.example.com
    ajohndoe@terminus.foo.example.com

But it doesn't match:

     johndoe@example.com
     brent@foo.example.com


Example 4 - matching the beginning and end of string

     /^johndoe\@.*cs\.example\.org$/

This is similar to Example 4.3, and matches the same first two strings:

     johndoe@foo.example.org
     johndoe@terminus.foo.example.org

But it doesn't match:

     ajohndoe@terminus.foo.example.org

...because the regular expression says the string has to begin with
letter "j" and end with letter "g", by using the ^ and $ symbols, and
neither of those is true for ajohndoe@terminus.foo.example.org@example.com.


Example 5 - matching anything and everything

     /.*/

This is the regular expression that matches anything
(any character, zero or more times).


Example 6 - escaping '*' is required

     /.\*johndoe/

Here the * is preceded by a \, so it refers literally to an asterisk
character and not the symbolic meaning "zero or more times".  The '.' still
has its symbolic meaning of "any one character", so it would match:

     a*johndoe
     s*johndoe

Because the . by itself implies one character, it would not match:

     *johndoe


Example 7 - case sensitivity

Normally all matches are case sensitive; you can make any match case
insensitive by appending an `i' to the end of the expression.

     /example\.com/i

This would match example.com, EXAMPLE.com, ExAmPlE.cOm, etc.  Removing the `i':

     /example\.com/

...would match example.com but not EXAMPLE.com or any other capitalization.


Example 8 - overly safe escaping doesn't hurt

To be on the safe side put a \ in front of any characters in the
regular expressions that are not numbers or letters.  In order to put
a / into the regular expression, the same rule holds: precede it
with a \.  Thus, with \ in front of the / and = characters, this:

     /\/CO\=US/

...matches /CO=US and may be a useful regular expression to those of you
who need to deal with X.400 addresses that contain / characters.


Example 9 - matching (or NOT matching) white space

Normally, all whitespace within a pattern is matched verbatim, but it is
sometimes desirable to add some additional space within a pattern to make
it more readable.  For instance, here is a pattern matching some common
quoting characters in email:

  /^(-|:|>|[a-z]+>)/i

This can be a bit difficult to follow, so we can space it out a bit:

  /^( - | : | > | [a-z]+> )/xi

The 'x' modifier specifies that whitespace is to be ignored, and makes the
pattern a bit easier to read.  If you want to match actual whitespace, use
'\s'.

Note that the 'x' modifier provides additional functionality to Perl code
relating to comments, but because Majordomo requires patterns to lie all on
a single line, this is not significant here.


Example 10 - negated or inverted matches

Negated matches (like !/^sub/) work in places where they have meaning, such
as the taboo expression matcher which has lots of complicated logic to handle
them, but not all places. Majordomo patterns just get sent through a function
that turns them into regular expressions... which may or may not make sense
in the context you want to use them.

For example
   who-regexp listname !/xxx\.com/
will produce a list of subscribers to "listname" that are NOT from the
'xxx.com' domain. Be careful to escape the period, which otherwise will
match any character, not just a period.

Undelimited Patterns
--------------------

In the previous sections, all of the patterns were considered to be
enclosed in quotes, slashes, or percent signs.  It is legitimate 
to use patterns without enclosing them in those delimiters in some
cases.  However, the kind of matching done will depend upon where 
the pattern is used.

  In the archive-sync command, an exact match.
  In the lists and rekey commands, an exact, case-insensitive match.
  In the which and who commands, a case-insensitive substring match.

  In the attachment_filters setting, an exact, case-insensitive match.
  In the attachment_rules setting, an exact, case-insensitive match.
  In the post_limits setting, a case-insensitive substring match.

  In all of the other cases mentioned in the first section, pattern
  delimiters are required.  Using a pattern without delimiters will
  cause an error.


See Also:
   help admin
   help archive
   help configset_access_rules
   help configset_advertise
   help configset_admin_body
   help configset_admin_headers
   help configset_attachment_filters
   help configset_attachment_rules
   help configset_bounce_probe_pattern
   help configset_bounce_rules
   help configset_delivery_rules
   help configset_noadvertise
   help configset_post_limits
   help configset_quote_pattern
   help configset_signature_separator
   help configset_taboo_body
   help configset_taboo_headers
   help lists
   help overview
   help rekey
   help set
   help unregister
   help unsubscribe
   help which
   help who

This is the "patterns" help document for 
Majordomo 2, version 0.1200401130.

For a list of all help documents, send the following command:
   help topics
in the body of a message to mj2@smoe.org.

For assistance, please contact the smoe.org administrators.
Sign In Sign Out Subscribe to Mailing Lists Unsubscribe or Change Settings Help