Student: | Matthias-Christian Ott |
---|---|
Mentor: | Alistair Crooks |
During this year’s Google Summer of Code I want to improve the performance of NetBSD’s regular expression library and add support to it for wide characters.
See also: Implementing efficient wide character regular expressions
We have chosen TRE because of Ville Laurikari’s convincing master thesis and Russ Cox’s article on regular expressions. Ville agreed to release TRE under a 2-clause BSD licence and we already ported TRE to NetBSD. TRE and agrep will integrated into the NetBSD src repository soon.
At the moment only a performance benchmark which is based on realistic regular expressions and data to ensure that TRE is verifiably efficient is missing.
20 April 2009 | Project accepted and officially announced |
26 April 2009 | Published project page |
6 May 2009 | Added an overview of regular expression engines |
23 May 2009 | Initial import of the regular expression fuzzer |
8 June 2009 | Initial import of TRE |
13 June 2009 | Initial import of AT&T’s regression tests |
18 June 2009 | Added termcap and recursion support to agrep |
30 June 2009 | TRE builds as part of NetBSD’s source TRE |
3 August 2009 | Code posted for public review |
See also: Google Summer of Code 2009 Timeline
Name | Licence | Syntax | Algorithm | wchar_t support |
---|---|---|---|---|
Curie | MIT | POSIX (subset) | non-backtracking NFA | no (UTF-8 support) |
dietlibc | GNU GPL, version 2 | POSIX (subset) | non-backtracking NFA (?) | no |
glibc | GNU LGPL, version 2.1 | POSIX | DFA | yes |
Hackerlab C Library | GNU GPL, version 2 | POSIX | DFA (?) | no (t_uchar support) |
libregexp9 | MIT or Lucent Public License | regexp | non-backtracking NFA | no (UTF-8 support) |
microregex | BSD (3-clause) | Perl | non-backtracking NFA | no |
NetBSD libc | BSD (2-clause) | POSIX | DFA | no |
Oniguruma | BSD (2-clause) | POSIX, Ruby | ? | no (support for various encodings) |
PCRE | BSD (3-clause) | Perl | backtracking NFA | no |
TRE | GNU LGPL, version 2.1 | POSIX with extensions | backtracking, parallel and approximate parallel TNFA | yes |
T-Rex | zlib/libpng | T-Rex | backtracking NFA | yes (only UTF-8) |
utf8regex | GNU LGPL, version 2.1 | POSIX | DFA | no (UTF-8 support) |
See also:
I have developed a fuzzer for regular expressions, which randomly generates strings from POSIX-compliant regular expressions and tests whether the regular expression library accepts the string. Moreover, AT&T’s regession tests are used to test POSIX-conformance.
:wq
$Id: index.html,v 1.14 2009/08/19 18:21:03 hyperyl Exp $