regular expressions

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Kevin Buettner
Date:  
Subject: regular expressions
On Feb 19, 6:06pm, Julian M Catchen wrote:

> Ok, I was confused, the rx library is included by default in glibc. The
> problem I am having is the following:
>
> To use the regex functions, I have to create a structure, regex_t. If i
> just declare it like this:
>
> struct regex_t rx;
>
> The compiler errors out saying "storage size of `rx' isn't known".
>
> Can anyone give me some pointers on how to malloc this thing?


Do you have a ``#include <regex.h>'' statement prior to your
declaration for ``rx''?

Below is a simple program which demonstrates the use of the functions
documented on the recomp() man page. To try it, put it in a file
called simple-grep.c and do the following:

ocotillo:ctests$ gcc -Wall -o simple-grep -g simple-grep.c 
ocotillo:ctests$ ./simple-grep 'reg(exec|comp)' <simple-grep.c 
Line 21:   errcode = regcomp (&rx, argv[1], REG_EXTENDED | REG_NOSUB);
Line 37:       if (regexec (&rx, line, 0, 0, 0) == 0)


--- simple-grep.c ---
#include <sys/types.h>
#include <stdio.h>
#include <regex.h>

#define MAXLINESIZE 4096

int
main (int argc, char **argv)
{
regex_t rx;
int errcode;
char line[MAXLINESIZE];
int linenum;

  if (argc != 2)
    {
      fprintf (stderr, "Usage: $s pattern\n");
      exit (1);
    }


  errcode = regcomp (&rx, argv[1], REG_EXTENDED | REG_NOSUB);
  if (errcode != 0)
    {
      char *buf;
      size_t bufsize;


      bufsize = regerror (errcode, &rx, 0, 0);
      buf = alloca (bufsize);
      regerror (errcode, &rx, buf, bufsize);
      fprintf (stderr, "Error compiling pattern: %s\n", buf);
      exit (1);
    }


  linenum = 1;
  while (fgets (line, MAXLINESIZE, stdin))
    {
      if (regexec (&rx, line, 0, 0, 0) == 0)
    {
      printf ("Line %d: %s", linenum, line);
      if (line[strlen (line) - 1] != '\n')
        printf ("\n");
    }


      if (strlen (line) != MAXLINESIZE - 1 || line[MAXLINESIZE - 2] == '\n')
    linenum++;
    }


regfree (&rx);
exit (0);
}
--- end simple-grep.c ---

By way of comparison the above C program is roughly comparable to the
following perl program:

--- simple-grep.pl ---
#!/usr/bin/perl -w

$pattern = shift;

while (<>) {
  chomp;
  print "$ARGV, $.:$_\n"    if /$pattern/;
}
--- end simple-grep.pl ---


Actually, the perl solution is superior since it is capable of reading
from files whose names are supplied from the command line as well as
STDIN. Also, it deals with long lines much less clumsily.

Here's example of the perl program in action:

ocotillo:ctests$ ./simple-grep.pl 'reg(exec|comp)' <simple-grep.c 
-, 21:  errcode = regcomp (&rx, argv[1], REG_EXTENDED | REG_NOSUB);
-, 37:      if (regexec (&rx, line, 0, 0, 0) == 0)
ocotillo:ctests$ ./simple-grep.pl 'reg(exec|comp)' simple-grep.c 
simple-grep.c, 21:  errcode = regcomp (&rx, argv[1], REG_EXTENDED | REG_NOSUB);
simple-grep.c, 37:      if (regexec (&rx, line, 0, 0, 0) == 0)


HTH,

Kevin