
| Student: | Matthias-Christian Ott |
|---|---|
| Mentor: | Alistair Crooks |
During this year’s Google Summer of Code I want to improve the performance of NetBSD’s regular expression library and add support to it for wide characters.
See also: Implementing efficient wide character regular expressions
We have chosen TRE because of Ville Laurikari’s convincing master thesis and Russ Cox’s article on regular expressions. Ville agreed to release TRE under a 2-clause BSD licence and we already ported TRE to NetBSD. TRE and agrep will integrated into the NetBSD src repository soon.
At the moment only a performance benchmark which is based on realistic regular expressions and data to ensure that TRE is verifiably efficient is missing.
| 20 April 2009 | Project accepted and officially announced |
| 26 April 2009 | Published project page |
| 6 May 2009 | Added an overview of regular expression engines |
| 23 May 2009 | Initial import of the regular expression fuzzer |
| 8 June 2009 | Initial import of TRE |
| 13 June 2009 | Initial import of AT&T’s regression tests |
| 18 June 2009 | Added termcap and recursion support to agrep |
| 30 June 2009 | TRE builds as part of NetBSD’s source TRE |
| 3 August 2009 | Code posted for public review |
See also: Google Summer of Code 2009 Timeline
| Name | Licence | Syntax | Algorithm | wchar_t support |
|---|---|---|---|---|
| Curie | MIT | POSIX (subset) | non-backtracking NFA | no (UTF-8 support) |
| dietlibc | GNU GPL, version 2 | POSIX (subset) | non-backtracking NFA (?) | no |
| glibc | GNU LGPL, version 2.1 | POSIX | DFA | yes |
| Hackerlab C Library | GNU GPL, version 2 | POSIX | DFA (?) | no (t_uchar support) |
| libregexp9 | MIT or Lucent Public License | regexp | non-backtracking NFA | no (UTF-8 support) |
| microregex | BSD (3-clause) | Perl | non-backtracking NFA | no |
| NetBSD libc | BSD (2-clause) | POSIX | DFA | no |
| Oniguruma | BSD (2-clause) | POSIX, Ruby | ? | no (support for various encodings) |
| PCRE | BSD (3-clause) | Perl | backtracking NFA | no |
| TRE | GNU LGPL, version 2.1 | POSIX with extensions | backtracking, parallel and approximate parallel TNFA | yes |
| T-Rex | zlib/libpng | T-Rex | backtracking NFA | yes (only UTF-8) |
| utf8regex | GNU LGPL, version 2.1 | POSIX | DFA | no (UTF-8 support) |
See also:
I have developed a fuzzer for regular expressions, which randomly generates strings from POSIX-compliant regular expressions and tests whether the regular expression library accepts the string. Moreover, AT&T’s regession tests are used to test POSIX-conformance.
:wq
$Id: index.html,v 1.14 2009/08/19 18:21:03 hyperyl Exp $