Index: dspam_dump.1 =================================================================== --- dspam_dump.1 (.../vendor/dspam/3.0.0) (revision 39233) +++ dspam_dump.1 (.../trunk/email/main/bayes/regress/dspam3) (revision 39233) @@ -1,80 +0,0 @@ -.\" -*- nroff -*- -.\" -.\" dspam_dump3.0 -.\" -.\" Authors: Jonathan A. Zdziarski -.\" -.\" Copyright (c) 2004 Network Dweebs Corporation -.\" All rights reserved -.\" -.TH dspam_dump 1 "May 31, 2004" "DSPAM" "DSPAM" - -.SH NAME -dspam_dump - produce a dump of a user's metadata - -.SH SYNOPSIS -.na -.B dspam_dump -[\c -.BI \ username \fR -] -[\c -.BI \ token \fR -] - -.ad -.SH DESCRIPTION -.LP -.B dspam_dump -dumps a user's metadata dictionary to stdout. This can be used to view the -entire contents of a user's dictionary, or used in combination with grep to view -a subset of data. The output provides the token's stored value (in CRC64 -format), the number of spam and nonspam hits, and the token's computed -probability. - -.SH OPTIONS -.LP -.ne 3 -.TP -.BI \ username \fR\c -The username of the user to dump - -.n3 -.TP -.BI \ token \fR\c -The text string of the token to search for and dump. If no token is specified, -.B all tokens -will be dumped to stdout. - -.SH EXAMPLES -.B dspam_dump user "Subject*Viagra" - -Dumps the token Subject*Viagra, which represents the word Viagra in the Subject -header, for the user specified. - -.SH EXIT VALUE -.LP -.ne 3 -.PD 0 -.TP -.B 0 -Operation was successful. -.ne 3 -.TP -.B other -Operation resulted in an error. -.PD - -.SH AUTHORS -.LP - -Jonathan A. Zdziarski - -For more information, see http://www.nuclearelephant.com. - -.SH SEE ALSO -.BR dspam (1), -.BR dspam_stats (1), -.BR dspam_corpus (1), -.BR dspam_clean (1), -.BR dspam_merge (1) Index: dspam_clean.1 =================================================================== --- dspam_clean.1 (.../vendor/dspam/3.0.0) (revision 39233) +++ dspam_clean.1 (.../trunk/email/main/bayes/regress/dspam3) (revision 39233) @@ -1,110 +0,0 @@ -.\" -*- nroff -*- -.\" -.\" dspam_clean3.0 -.\" -.\" Authors: Jonathan A. Zdziarski -.\" -.\" Copyright (c) 2004 Network Dweebs Corporation -.\" All rights reserved -.\" -.TH dspam_clean 1 "May 31, 2004" "DSPAM" "DSPAM" - -.SH NAME -dspam_clean - perform periodic maintenance of metadata - -.SH SYNOPSIS -.na -.B dspam_clean -[\c -.I \-s[signature_life]\fR\c -] -[\c -.I \-p[probability_life]\fR\c -] -[\c -.I \-u[sl,hcl,shl,ihl]\fR -] -[\c -.I \ user1 user2 ... userN \fR -] - -.ad -.SH DESCRIPTION -.LP -.B dspam_clean -is used to perform periodic housecleaning on DSPAM's metadata dictionary by -deleting old or useless data. - -.SH OPTIONS -.LP -.ne 3 -.TP -.BI \-s\fR\c -Performs stale signature purging. If a value is specified, the default value of -14 days will be overridden. Specifying an age of 0 will delete all signatures -from the user(s) processed. - -.n 3 -.TP -.BI \-p\fR\c -Deletes all tokens from the target user(s) database whose probability is -between 0.35 and 0.65 (fairly neutral, useless data). If a value is -specified, the default life of 30 days will be overridden. It's a good idea -to use this flag once with a life of 0 days for users after a significant amount -of corpus training. - -.n 3 -.TP -.BI \-u\fR\c -Deletes all unused tokens from a user's dataset. Four different life values -are used: - -.B sl -Stale tokens which have not been used for a long period of time - -.B hcl -Tokens with a total hit count below 5 (which will be assigned a hapaxial value -by DSPAM) - -.B shl -Tokens witha single spam hit - -.B ihl -Tokens with a single innocent hit - -Ages may be overridden by specifying a format string, such as -u30,15,10,10 -where each number represents the respective life. Specifying a life of zero -will delete all unused tokens in the category. - -.n 3 -.TP -.BI \ user1\ user2\ ...\ userN\fR\c -Specify the username(s) to perform the selected maintenance operations on. If -no username is specified, all users are processed. - -.SH EXIT VALUE -.LP -.ne 3 -.PD 0 -.TP -.B 0 -Operation was successful. -.ne 3 -.TP -.B other -Operation resulted in an error. -.PD - -.SH AUTHORS -.LP - -Jonathan A. Zdziarski - -For more information, see http://www.nuclearelephant.com. - -.SH SEE ALSO -.BR dspam (1), -.BR dspam_stats (1), -.BR dspam_corpus (1), -.BR dspam_dump (1), -.BR dspam_merge (1) Index: dspam.pc.in =================================================================== --- dspam.pc.in (.../vendor/dspam/3.0.0) (revision 39233) +++ dspam.pc.in (.../trunk/email/main/bayes/regress/dspam3) (revision 39233) @@ -1,10 +0,0 @@ -prefix=@prefix@ -exec_prefix=@exec_prefix@ -libdir=@libdir@ -includedir=@includedir@ - -Name: DSPAM -Description: DSPAM Anti-Spam Library -Version: @VERSION@ -Libs: -L${libdir} -ldspam -Cflags: -I${includedir}/dspam Index: dspam_stats.1 =================================================================== --- dspam_stats.1 (.../vendor/dspam/3.0.0) (revision 39233) +++ dspam_stats.1 (.../trunk/email/main/bayes/regress/dspam3) (revision 39233) @@ -1,70 +0,0 @@ -.\" -*- nroff -*- -.\" -.\" dspam_stats3.0 -.\" -.\" Authors: Jonathan A. Zdziarski -.\" -.\" Copyright (c) 2004 Network Dweebs Corporation -.\" All rights reserved -.\" -.TH dspam_stats 1 "May 31, 2004" "DSPAM" "DSPAM" - -.SH NAME -dspam_stats - display spam statistics - -.SH SYNOPSIS -.na -.B dspam_stats -[\c -.BI \-H\fR\c -] -[\c -.I \ username \fR\c -] - -.ad -.SH DESCRIPTION -.LP -.B dspam_stats -displays the spam filtering statistics for one or all users on the system. Displays TS (Total Spams), TI (Total Innocent), TM (Total Spam Misses) and FP (Total False Positives). To calculate the total number of spams caught by DSPAM, subtract TM from TS. - -.SH OPTIONS -.LP -.ne 3 -.TP -.BI \-H\fR\c -Uses multi-line, human-readable output displaying the fully-qualified names for -each class of totals, instead of their abbreviated terms. - -.n3 3 -.TP -.BI \[username]\c -Specifies the username to query. If no username is provided, all users will be -queried. - -.SH EXIT VALUE -.LP -.ne 3 -.PD 0 -.TP -.B 0 -Operation was successful. -.ne 3 -.TP -.B other -Operation resulted in an error. -.PD - -.SH AUTHORS -.LP - -Jonathan A. Zdziarski - -For more information, see http://www.nuclearelephant.com. - -.SH SEE ALSO -.BR dspam (1), -.BR dspam_corpus (1), -.BR dspam_clean (1), -.BR dspam_dump (1), -.BR dspam_merge (1) Index: dspam.1 =================================================================== --- dspam.1 (.../vendor/dspam/3.0.0) (revision 39233) +++ dspam.1 (.../trunk/email/main/bayes/regress/dspam3) (revision 39233) @@ -1,218 +0,0 @@ -.\" -*- nroff -*- -.\" -.\" dspam3.0 -.\" -.\" Authors: Jonathan A. Zdziarski -.\" -.\" Copyright (c) 2004 Network Dweebs Corporation -.\" All rights reserved -.\" -.TH DSPAM 1 "May 31, 2004" "DSPAM" "DSPAM" - -.SH NAME -dspam \- DSPAM Anti-Spam Agent - -.SH SYNOPSIS -.na -.B dspam -[\c -.BI \--mode=[teft|toe|tum|notrain]\fR\c -] -[\c -.BI \--user\ user1 -user2\ ...\ userN\fR\c -] -[\c -.BI \--feature\c -=[ch,no,wh,tb=N]\fR\c -] -[\c -.B \--class\c -=[spam|innocent]\fR\c -] -[\c -.B \--source\c -=[error|corpus|inoculation] \c -] -[\c -.B \--deliver\c -=[spam,innocent] \c -] -[\c -.B \--help \c -] -[\c -.B \--process \c -] -[\c -.B \--classify \c -] -[\c -.BI \--stdout \c -] -[\c -.I \ delivery\_arguments \fR\c -] - -.ad -.SH DESCRIPTION -.LP -.B The DSPAM agent -provides a direct interface to mail servers for command-line -spam filtering. The agent can masquerade as the mail server's local delivery -agent and will process any email passed to it. The agent will then call whatever -delivery agent was specified at compile time or quarantine/tag/drop messages -identified as spam. The DSPAM agent can function locally or as a proxy. It -is also responsible for processing classification errors so that DSPAM can -learn from its mistakes. - -.SH OPTIONS -.LP -.ne 3 -.TP -.BI \--user\ user1 \ user2\ ...\ userN\fR\c -Specifies the destination users of the incoming message. In most cases this is -the local user on the system, however some implementations may call for virtual -usernames, specific to DSPAM, to be assigned. The agent processes an -incoming message once for each user specified. If the message is to be -delivered, the $u (or %u) parameters of the argument string will be interpolated -for the current user being processed. - -.n3 3 -.TP -.BI \--mode= [toe|tum|teft|notrain]\c -Configures the training mode to be used for this process: - -.B teft -: Train-Everything. Trains on all messages processed. This is a very thorough training approach and should be considered the standard training approach for most users. TEFT may, however, prove too volatile on installations with extremely high per-user traffic, or prove not very scalable on systems with extremely large user-bases. In the event that TEFT is proving ineffective, one of the other modes is recommended. - -.B toe -: Train-on-Error. Trains only on a classification error, once the user's metadata has matured to 2500 innocent messages. This training mode is much less resource intensive, as only occasional metadata writes are necessary. It is also far less volatile than the TEFT mode of training. One drawback, however, is that TOE only learns when DSPAM has made a mistake - which means the data is sometimes too static, and unable to "ease into" a different type of behavior. - -.B tum -: Train-until-Mature. This training mode is a hybrid between the other two training modes and provides a great balance between volatility and static metadata. TuM will train on a per-token basis only tokens which have had fewer than 25 "hits" on them, unless an error is being retrained in which case all tokens are trained. This training mode provides a solid core of stable tokens to keep accuracy consistent, but also allows for dynamic adaptation to any new types of email behavior a user might be experiencing. - -.B notrain -: No training. Do not train the user's data, and do not keep totals. This should only be used in cases where you want to process mail for a particular user (based on a group, for example), but don't want the user to accumulate any learning data. - -.ne 3 -.TP -.BI \--feature= [chained,noise,tb=N,whitelist] \c -Specifies the features that should be activated for this filter instance. The following features may be used individually or combined using a comma as a delimiter: - -.B chained -: Chained Tokens (also known as biGrams). Chained Tokens combines adjacent tokens, presently with a window size of 2, to form token "chains". Chained tokens uses additional storage resources, but greatly improves accuracy. Recommended as a default feature. - -.B noise -: Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in at 2500 innocent messages and provides an advanced progressive noise logic to reduce Bayesian Noise (wordlist attacks) in spams. See http://www.nuclearelephant.com/projects/dspam/bnr.html for more information. - -.B tb\=N -: Sets the training loop buffering level. Training loop buffering is the amount of statistical sedation performed to water down statistics and avoid false positives during the user's training loop. The training buffer sets the buffer sensitivity, and should be a number between 0 (no buffering whatsoever) to 10 (heavy buffering). The default is 5, half of what previous versions of DSPAM used. To avoid dulling down statistics at all during the training loop, set this to 0. - -.B whitelist -: Automatic whitelisting. DSPAM will keep track of the entire "From:" line for each message received per user, and automatically whitelist messages from senders with more than 20 innocent messages and zero spams. Once the user reports a spam from the sender, automatic whitelisting will automatically be deactivated for that sender. Since DSPAM uses the entire "From:" line, and not just the sender's email address, automatic whitelisting is a very safe approach to improving accuracy especially during initial training. - -.ne 3 -.TP -.BI \--class= [spam|innocent] \c -Identifies the disposition (if any) of the message being presented. This flag -should be used when a misclassification has occured, when the user is -corpus-feeding a message, or when an inoculation is being presented. This -flag should not be used for standard processing. This flag must be used in -conjunction with the --source flag. Omitting this flag causes DSPAM to -determine the disposition of the message on its own (the standard operating -mode). - -.ne 3 -.TP -.BI \--source= [error|corpus|inoculation] \c -Where -.B --class -is used, the source of the classification must also be provided. The source -tells dspam how to learn the message being presented: - - -.B error -: The message being presented was a message previously misclassified by DSPAM. When 'error' is provided as a source, DSPAM requires that the DSPAM signature be present in the message, and will use the signature to recall the original training metadata. If the signature is not present, the message will be rejected. In this source mode, DSPAM will also decrement each token's previous classification's count as well as the user totals. - -You should use error only when DSPAM has made an error in classifying the message, and should present the modified version of the message with the DSPAM signature when doing so. - -.B corpus -: The message being presented is from a mail corpus, and should be trained as a new message, rather than re-trained based on a signature. The message's full headers and body will be analyzed and the correct classification will be incremented, without its opposite being decremented. - -You should use corpus only when feeding messages in from corpus. - -.B inoculation -: The message being presented is in pristine form, and should be trained as an inoculation. Inoculations are a more intense mode of training designed to cause DSPAM to train the user's metadata repeatedly on previoulsy unknown tokens, in an attepmt to vaccinate the user from future messages similar to the one being presented. You should use inoculation only on honeypots and the like. - -.ne 3 -.TP -.BI \--deliver= [innocent,spam]\c -Tells -.B DSPAM -to deliver the message if its result falls within the criteria specified. For example, --deliver=innocent will cause DSPAM to only deliver the message if its classification has been determined as innocent. Providing --deliver=innocent,spam will cause DSPAM to deliver the message regardless of its classification. This flag provides a significant amount of flexibility for nonstandard implementations. - -.ne 3 -.TP -.B \--stdout \c -If the message is indeed deemed "deliverable" by the -.B\--deliver -flag, this flag will cause DSPAM to deliver the message to stdout, rather than the configured delivery agent. - -.ne 3 -.TP -.B \--process \c -Tells -.B DSPAM -to process the message. This is the default behavior, and the flag is implied unless -.B \--classify -is used. - -.ne 3 -.TP -.BI \--classify\c -Tells -.B DSPAM -to only classify the message, and not perform any writes to the user's -data or attempt to deliver/quarantine the message. The results of a -classification are printed to stdout in the following format: - -X-DSPAM-Result: User; result="Spam"; probability=1.0000; confidence=0.80 - -.B NOTE -: The output of the classification is specific to a user's own data, and -does not include the output of any groups they might be affiliated with, -so it is entirely possible that the message would be caught as spam by a -group the user belongs to, and appear as innocent in the output of a -classification. To get the classification for the -.B group -, use the group name as the user instead of an individual. - -.SH EXIT VALUE -.LP -.ne 3 -.PD 0 -.TP -.B 0 -Operation was successful. -.ne 3 -.TP -.B other -Operation resulted in an error. If the error involved an error in calling the -delivery agent, the exit value of the delivery agent will be returned. -.PD - -.SH AUTHORS -.LP - -Jonathan A. Zdziarski - -For more information, see http://www.nuclearelephant.com. - -.SH SEE ALSO -.BR dspam_stats (1), -.BR dspam_corpus (1), -.BR dspam_clean (1), -.BR dspam_dump (1), -.BR dspam_merge (1) - Index: dspam_merge.1 =================================================================== --- dspam_merge.1 (.../vendor/dspam/3.0.0) (revision 39233) +++ dspam_merge.1 (.../trunk/email/main/bayes/regress/dspam3) (revision 39233) @@ -1,87 +0,0 @@ -.\" -*- nroff -*- -.\" -.\" dspam_merge3.0 -.\" -.\" Authors: Jonathan A. Zdziarski -.\" -.\" Copyright (c) 2004 Network Dweebs Corporation -.\" All rights reserved -.\" -.TH dspam_merge 1 "May 31, 2004" "DSPAM" "DSPAM" - -.SH NAME -dspam_merge - merge several users' metadata into a composite - -.SH SYNOPSIS -.na -.B dspam_merge -[\c -.BI \ user1\ user2\ ...\ userN \fR -] -[\c -.BI \ -o \ username \fR -] - -.ad -.SH DESCRIPTION -.LP -.B dspam_merge -merges several users' metadata into a single user's dictionary. This tool -is designed to create global users and seeded data. The hit sounds for each -token and per-user totals are added together to produce a single composite -dataset. After creating a composite user, -.B dspam_clean -should be run with the -p option to clean up extraneous data. - -.B NOTE -: Merges may take a considerable amount of time. This could potentially increase -the load on the server or even slow down the delivery of email. A merge should -only be performed when the system is fairly acquiesce. - -.SH OPTIONS -.LP -.ne 3 -.TP -.BI \ user1\ user2\ ...\ userN \fR\c -A list of users to merge together. - -.n3 -.TP -.BI \ -o \ username \fR\c -The target user which will be created (if necessary). This user will contain -the composite generated by the merge. - -.SH EXAMPLES -.B dspam_merge dick jane spot -o ralph - -Merges the metadata dictionaries of dick, jane, and spot into a single -composite under the user -.B ralph -. - -.SH EXIT VALUE -.LP -.ne 3 -.PD 0 -.TP -.B 0 -Operation was successful. -.ne 3 -.TP -.B other -Operation resulted in an error. -.PD - -.SH AUTHORS -.LP - -Jonathan A. Zdziarski - -For more information, see http://www.nuclearelephant.com. - -.SH SEE ALSO -.BR dspam (1), -.BR dspam_dump (1), -.BR dspam_stats (1), -.BR dspam_corpus (1), -.BR dspam_clean (1) Index: CHANGELOG =================================================================== --- CHANGELOG (.../vendor/dspam/3.0.0) (revision 39233) +++ CHANGELOG (.../trunk/email/main/bayes/regress/dspam3) (revision 39233) @@ -1,3370 +0,0 @@ -Version 3.0.0 -------------- - -[20040614.0700] jonz: fixed 14-day user graphs - -fixed a bug causing the 14-day user graphs to appear empty - -[20040612.0018] jonz: oracle storage driver fixes - -made several bugfixes to oracle storage driver -added --with-oracle-version[=10] configure flag for linking to 10g libraries - -[20040609.0205] jonz: fixed a bug in --enable-signature-attachments - -fixed two bugs using --enable-signature attachments; 1 compiler error and 1 -segfault (uninitialized value) - -[20040608.0715] jonz: fixed compile bug with --enable-webmail - -fixed compile errors resulting from --enable-webmail - -[20040607.1800] jonz: replaced quarantine locking with fcntl locking - -replaced quarantine .lock'ing with fcntl locking and also applied it to -locking .log files. fcntl should work over NFS. - -[20040607.0730] jonz: fixed rare segfault (strlen on NULL) - -fixed a rare segfault in decode.c - -[20040607.0730] jonz: minor aesthetic changes to cgi - -minor aesthetic changes to cgi - -[20040606.1445] jonz: added training left option to dspam_stats -H - -modified dspam_stats to display # of training messages left when using -H -command - -[20040606.1441] jonz: fixed bug in training threshold - -fixed a bug in the training threshold, which miscalculated the mail left to -train. - -[20040605.1521] jonz: added statistical sedation to cgi - -added level of sensitivity-during-training to cgi preferences - -[20040605.1450] jonz: added ability to edit user preferences from admin suite - -added the ability to edit user preferences (and the default preferences) -from the admin suite. - -[20040605.1100] jonz: fixed a bug with user processing flag - -fixed a bug where some parameters may be added as users instead of parameters. -this was particularly the case if no mailer flags prepended %u. - -[20040604.0525] jonz: fixed blank dspam signature on reclassification - -fixed a problem where reclassified messages would receive: - -X-DSPAM-Signature: !DSPAM! - -fixed this by NOT stripping the old X-DSPAM-Signature header, since a new one -is not created upon reclassification - -[20040604.0525] jonz: fixed untrusted.mailer_args - -fixed a bug where the last argument of untrusted.mailer_args was ignored. - -Version 3.0.0.rc2 ------------------ - -[20040603.2215] jonz: added user-logging option - -added --disable-user-logging option to disable user logging - -[20040603.0500] jonz: auto-whitelisting now works with toe-mode training - -added code to cause automatic whitelisting to function with toe-mode training - -[20040602.0030] jonz: added administration suite cgi - -added administration suite cgi - -[20040602.0030] jonz: added system logging of execution time - -added system logging of execution time - -[20040602.0025] jonz: fixed spam subject - -fixed spam subject headings to support variable length titles - -[20040601.2230] jonz: added system logging - -added system logging to DSPAM_HOME/system.log for future sysadmin interface - -[20040601.1822] jonz: removed mysql delay_key_write - -removed mysql's delay_key_write feature from the sql scripts, because of a -bug in mysql that leads to database corruption when using it. - -[20040601.0330] jonz: added To: header parsing - -added --enable-parse-to-header, which will parse spam-username and fp-username -from the To: header of a message to determine the username. This can be -used in lieu of using spam/fp aliases by creating a wildcard subdomain -(such as spam.yourdomain.com) and piping all email into dspam without a ---user flag, for example: - -wildcard: "|/usr/local/bin/dspam --mode=toe --class=spam --source=error" - -[20040531.2245] jonz: added pkgconfig files - -added installation of pkgconfig files submitted by Ronald Hummelink - - -[20040531.2120] jonz: added --enable-broken-return-codes - -added --enable-broken-return-codes configure option which causes DSPAM to -return an exit code of 99 if the message being processed is believed to be -spam, 0 if not, and any other code to suggest an error has occured. this is -useful for some MTAs such as qmail. - -[20040531.2100] jonz: fixed error.h overwrite bug - -fixed a bug where libc's error.h would be overwritten if --prefix=/usr. DSPAM -headers are now written to includedir/dspam. - -[20040531.1915] jonz: added man pages - -added man pages to distribution - -[20040531.0830] jonz: fixed header signature stripping - -signatures no longer stripped if --enable-signature-headers is used; to allow -for re-re-training - -[20040531.0830] jonz: fixed cgi graphs falling below zero - -minor fix to cgi graphs preventing data points from falling below zero - -Version 3.0.0.rc1 ------------------ - -[20040528.0100] jonz: added logging support - -added support for message logging (enabled by default). logs all classification -calls to $DSPAM_HOME/data/user/user.log. disable with --disable-logging. - -[20040527.2200] jonz: added new CGI - -added new CGI - -[20040527.0730] jonz: added support for profiling - -added support for profiling using gmon output. this allows developers to use -profiling tools such as gprof to analyze the performance of the software. - -[20040527.0730] jonz: applied patch submitted by Mark Femal - -applied a patch submitted by Mark Femal which: -1. Includes select *.h files and incorporates them into the installation -2. Fixes some issues in compiling with Sun's Pro C compiler -3. Makes some minor changes to header files to avoid conflicts - -Version 3.0.0.beta.3.1 ----------------------- - -[20040525.0830] jonz: fixed compiler error on verbose debug - -fixed compiler errors when verbose debug was enabled - -Version 3.0.0.beta.3 --------------------- - -[20040524.2024] jonz: bugfix for null bodies - -applied bugfix causing a segfault when the message body of some parts was -null. rare occurrence. - -[20040524.1903] jonz: implemented Robinson's technique for combining p-values - -added support for using Robinson's technique for combining p-values, as -described at http://www.linuxjournal.com/article.php?sid=6467. This technique -is presently used for chi-square calculations, but using ---enable-robinson-pvalues will use this technique for *all* calculations in -place of Graham's approach. Appears to provide slightly better results -(on the order of 1 message per thousand). - -[20040524.0529] jonz: implemented *real* chi-square - -implement Fisher-Robinson's Inverse Chi-Square algorithm...the real stuff. -use --enable-chi-square to use. - -[20040522.2350] jonz: renamed chi-square to robinson's naive bayesian - -renamed chi-square because it really isn't chi-square, but robinson's first -algorithm for naive bayesian combination. use --enable-robinson to use. - -[20040520.0800] jonz: bugfix for attachments - -fixed a bug that caused message headers in attachment sections to be ignored - -Version 3.0.0.beta.2.1 ----------------------- - -[20040518.0630] jonz: bugfix: seg faults on rare occasions - -fixed a strlen(NULL) bug fixing an occasional segfault - -[20040514.1130] jonz: applied dspam_genaliases patch - -applied dspam_genaliases patch supplied by Scott Moorhouse - which adds the following functionality: - ---exclude NAME Do not generate an alias for username / usernames. ---excludeuid NUM Do not generate an alias for UID / UIDS. ---minuid NUM Minimum UID for which to generate an alias. ---maxuid NUM Maximum UID for which to generate an alias. - -It also uses setpwent/getpwent to get passwd information instead -of /etc/passwd. This allows the tool to be used with any default system -authentication. - -[20040514.0830] jonz: modified mode=notrain to ignore signature - -when setting mode=notrain, the signature is NOT stored, and not appended to -an email. - -Version 3.0.0.beta.2 --------------------- - -[20040513.1845] jonz: updated configure.ac - -updated configure.ac to work with newer versions of autoconf (with warnings) - -[20040513.0157] jonz: segfault patch for sql drivers - -applied patch to prevent segfaults in mysql and pgsql drivers under certain -conditions - -[20040512.0830] jonz: user directories moved to $DSPAM_HOME/data - -user directories have been moved to $DSPAM_HOME/data. it will be necessary to -move all user directories into this folder when upgrading - -[20040512.0830] jonz: default $DSPAM_HOME changed - -default dspam home has been changed from /etc/mail/dspam to /var/dspam. use ---with-dspam-home to change this. - -[20040512.0830] jonz: patch for sql drivers - -applied patch for mysql and pgsql drivers to prevent errors in sql due to -lack of commas - -Version 3.0.0.beta.1.2 ----------------------- - -[20040504.1835] jonz: bugfix for signed message signature - -corrected a bug where the boundary for a signed message would be missing -a carriage return. - -[20040504.0548] jonz: bugfix for token storage bug - -fixed a token storage bug, where some tokens would not be stored if they -were preceeded by a token that was found in the database - -[20040503.0830] jonz: bugfix for corpus spam delivery - -fixed a bug where corpusfed messages would be delivered if a quarantine agent -was specified at configure time. - -[20040501.1052] jonz: added spam-subject feature - -added a spam-subject feature which can be activated with --enable-spam-subject. -when enabled, DSPAM will prepend [SPAM] to the subject headers of all messages -suspected to be spam. - -Version 3.0.0.beta.1.1 ----------------------- - -[20040501.0630] jonz: fixed critical problems with pgsql_drv driver - -fixed a critical problem with the postgres storage driver to correct sql errors -in processing - -Version 3.0.0.beta.1 --------------------- - -[20040430.0800] jonz: fix for sql driver subtractions - -implemented GREATEST(0, [Argument] ) functions for subtractions, which fixes a -problem in which error corrections are not made to tokens where there are -zero hits for the classification being subtracted from. should also -definitively prevent negative values in hit totals. - -[20040430.0800] jonz: bugfix: corpus feeding invoked test-conditional training - -fixed a bug where corpus feeding would invoke test-conditional training. - -[20040430.0800] jonz: test-conditional training to subtract only once - -test-conditional training modified to subtract from misclassified corpus only -once, and corpus feed for all other iterations - -[20040430.0800] jonz: fixed bug in sql-drivers/test-conditional training - -fixed a bug in the sql drivers where test condition training would make -exponential changes instead of incremental. this was due to not resetting -the control token on every call to _ds_getall_spamrecords. - -[20040430.0745] jonz: fixed bug in web stats - -fixed bug where merged group web stats wouldn't get written - -[20040430.0730] jonz: fixed bug in TOE totals - -fixed a bug where spam/innocent classified wasn't updated when TOE was used - -[20040427.0433] jonz: fixed bug in mysql and pgsql drivers - -fixed a bug in mysql and pgsql drivers where dspam_merge was functioning -incorrectly, due to the token count on record insertion being set to 1 or 0, -and not the actual token value. - -[20040427.0155] jonz: merged groups shouldn't merge with themselves - -corrected a situation where the actual user in a merged group could be merged -with themselves, if they were the target user. - -[20040427.0119] jonz: applied bdb patch for solaris - -applied a patch to building on Solaris 9 with BDB drivers - -[20040425.0757] jonz: updated pgsql drivers - -applied pgsql_drv storage driver updates submitted by Rustam Aliyev - -Version 3.0.0.alpha.6 ---------------------- - -[20040424.2235] jonz: fixed header tokenization - -fixed header tokenization from previous alpha; was suddenly leaving out -heading from token names. - -[20040424.1427] jonz: added merged groups - -merged groups are similar to global groups, only instead of the global user -being used in lieu of per-user statistics, the global user in a merged group -is merged with the user's own training data. this allows immediate correction -to take place and no training loop. - -NOTE: merged groups are storage driver dependent. presently they have only -been implemented for the mysql driver. - -[20040422.1900] jonz: messages with empty bodies should still be processed - -fixed bug where messages with empty bodies failed into delivery - -[20040422.1829] jonz: added encoding strip patch - -added patch to fix the stripping of the content-transfer-encoding - -[20040421.1809] jonz: added training mode 'notrain' - -added training mode 'notrain' which will process the message, but not train any -user data; this is ideal for implementations where a global dictionary is -used, but the administrator doesn't want to accumulate training data for each -user. - -[20040421.0310] jonz: fixed TOE-mode totals updating - -fixed bug where TOE-mode would update totals when it shouldn't - -Version 3.0.0.alpha.5 ---------------------- - -[20040421.0100] jonz: fixed totaling problems with classification groups - -fixed totaling problems with global users and classification groups, where -spams wouldn't get counted, and some innocents - -[20040421.0100] jonz: fix for dspam_stats - -fix for dspam_stats, identifying individual users - -[20040420.0734] jonz: fix for builds on Solaris w/BDB - -fixed compiler error when building on Solaris w/BDB drivers - -[20040419.0758] jonz: fix for X-DSPAM-Result header problem with TOE - -TOE resulted in the X-DSPAM-Result being send to stdout, which broke all -implementations of TOE where --stdout was used. bug fixed. - -[20040419.0700] jonz: added support for multipart/encrypted messages - -added the same support for multipart/encrypted messags as is provided -for multipart/signed - -[20040418.1840] jonz: changes to pgsql objects - -changes to pgsql objects to fix performance issues - -[20040417.1105] jonz: more global user tweaks - -if the global user thinks the message was innocent, but the user thinks it was -spam, retrain the message as a false positive into the user's dictionary -automatically, but don't update FP totals (internal function) - -[20040417.1050] jonz: implemented totals checking - -implemented totals checking to insure no totals travel below 0 - -[20040417.1045] jonz: don't retrain some classification catches - -patch added not to retrain some spams in a global user catch if the user's -own dictionary already learned it as spam - -[20040417.1037] jonz: patch for non-user creation - -patch made to sql-based drivers to avoid creating virtual users in cases where -a message isn't being directly processed (e.g. tools, error correction, etc.) - -[20040417.2006] jonz: added human-readable patch to dspam_stats - -added patch for human-readable format to dspam_stats, submitted by Alan -Shields - -Version 3.0.0.alpha.4 ---------------------- - -[20040416.0000] jonz: fix for global users to prevent FPs - -applied bugfix for global users code where false positives were getting -generated because the user's dictionary wasn't completely ignored. - -[20040416.0000] jonz: applied dspam_corpus division by zero patch - -applied div by zero patch for dspam_corpus submitted by Nick Burnett - -[20040415.0010] jonz: added end-of-token truncated symbols - -added support for end-of-token symbols, such as exclamation point. slight -boost in accuracy in testing. - -[20040414.0052] jonz: added abbreviated feature references - -the first two letters of a feature can be used alternatively instead of the -whole feature name; for example --feature=ch,no,wh - -[20040411.0100] jonz: added X-DSPAM-Confidence header - -added X-DSPAM-Confidence header to all processed messages to identify the -confidence level of the decision made. - -[20040410.0930] jonz: tum maturity level increased to 50 hits - -train-until-mature level increased from 25 hits to 50; doesn't appear to work -well in classification groups. - -[20040409.0201] jonz: added support for domain scale - -added support for domain scale applying patches submitted by -Patrick Tudor - -[20040409.0153] jonz: applied pgsql patches - -applied more pgsql patches - -[20040409.0129] jonz: fixed headers to preserve original encoding - -headers are now delivered with original encodings - -[20040407.2254] jonz: added mass false positive button to CGI - -added a button to reverse multipe false positives by clicking on checkboxes. - -[20040407.2248] jonz: fixed bug in classification groups - -fixed a bug in classification groups, where a "classify catch" would cause -the DSPAM signature to be empty, and thus irreversible. - -[20040407.0255] jonz: tweaks to postgres m4 - -tweaks to postgres m4 to test headers and library on configure - -Version 3.0.0.alpha.3 ---------------------- - -[20040406.0124] jonz: supress extra newline in message body - -corrected message reassembly behavior by supressing newline characters at the -end of the message body. - -[20040405.0524] jonz: added postgresql driver to project - -added pgsql_drv (PostgreSQL) submitted by Rustam Aliyev -to project, added to configure with its own set of configuration commands. -see tools.pgsql/README for more information. Applied recent SQL fixes. - -[20040405.0330] jonz: virtual users should not be created on reclassification - -if a message is being submitted for reclassification, a virtual user should not -be created, but fail instead - e.g. spam could be getting sent to the alias, -and shouldn't create new uids. - -[20040405.0233] jonz: fixed SQL-driver hits-below-zero bug - -fixed a bug causing some tokens to drop below zero hits using the mysql -driver. - -[20040405.0149] jonz: fixed BNR bug - -fixed a bug caused by Bayesian Noise Reduction which caused some messages -never to get learned if the control token was filtered; or caused filtered -tokens never to be learned. - -[20040403.1745] jonz: rewrite of libdspam API - -rewrite of libdspam's API. in short: - -- Operating modes DSM_ADDSPAM and DSM_FALSEPOSITIVE dropped -- CTX->classification added: DSR_ISSPAM | DSR_ISINNOCENT | DSR_NONE -- CTX->source added: DSS_ERROR | DSS_INOCULATION | DSS_CORPUS | DSS_NONE - -provides a much cleaner and less ambiguous interface - -[20040403.1215] jonz: removed signature deletion - -removed signature deletion from agent, so messages can be re-re-classified. -also prevents mysql errors. - -[20040403.1125] jonz: added dotfile debugging support - ---enable-debug and --enable-verbose-debug flags now require a .debug file -to be dropped in order to log debug messages, providing you with the ability -to dynamically activate/deactivate debug messages for some or all users. A -.debug file can either be dropped in DSPAM_HOME to activate debugging for all -users, or a username.debug file can be dropped in DSPAM_HOME/userpath/ to -activate debugging for a subset of users. - -[20040402.1839] jonz: added support for domain-name groups - -added support for groups based on domain name - -Version 3.0.0.alpha.2 ---------------------- - -[20040402.0730] jonz: improved agent classification output - -agent classification output improved to include username, result, probability, -and confidence level in MIME format for easy parsing - -[20040402.0730] jonz: added broken MTA support - ---enable-broken-mta -You should enable this if your MTA is broken and passes messages into DSPAM -with CTRL-M's (^M) in them. - -[20040402.0730] jonz: added training loop buffering feature - -Training loop buffering is the amount of statistical sedation performed to -water down statistics and avoid false positives during the user's training loop. -The training buffer sets the buffer sensitivity, and should be a number -between 0 (no buffering whatsoever) to 10 (heavy buffering). The default is 5, -half of what previous versions of DSPAM used. To avoid dulling down -statistics at all during the training loop, set this to 0. - -The training buffer can be set using bf=N as a feature, where N is the level of -buffering (0-10). For example: - ---feature=chained,noise,tb=10 - -Causes the buffer level to be set to 10, the highest level of safety, whereas - ---feature=chained,noise,tb=0 - -Removes all buffering constraints - -[20040402.0723] jonz: fixed bug in dspam_dump - -fixed a bug in dspam_dump causing unknown tokens to be displayed with -uninitialized values - -[20040402.0720] jonz: fixed bug in agent for signature dropping - -when a signature can't be found, the message is dropped; unfortunately the -agent forgot to shut down the dspam context which caused BDB to lock up. - -[20040402.0700] jonz: added switch for webmail - -The webmail switch is designed for systems where the original message remains -server side and can therefore be presented in pristine format for retraining. - - --enable-webmail - The webmail switch is designed for systems where the original message - remains server side and can therefore be presented in pristine format for - retraining. This option will cause DSPAM to cease all writing of - signatures and DSPAM headers to the message, and deliver the message in as - pristine format as possible. This mode REQUIRES that the original message - in its pristine format (as of delivery) be presented for retraining, as in - the case of webmail or other applications where the message is actually - kept server-side during reading, and is preserved. DO NOT use this switch - unless the original message can be presented for retraining with the - ORIGINAL HEADERS and NO MODIFICATIONS. - -[20040401.2243] jonz: fix for signature headers - -applied patch to fix multipart boundary bug when signature-headers is enabled - -Version 3.0.0.alpha.1 ---------------------- - -[20040401.1230] jonz: patches to corpus locking - -made patches for corpus locking, to help prevent corruption with BDB drivers. -DSPAM agent now drops a .corpuslock file upon processing a corpus which in -turn tells the drivers not to run automatic recovery. this should prevent -corruption when an email comes in while you are corpus training with the BDB -drivers. this was not an issue with the SQL-based drivers. - -[20040401.1230] jonz: deleted libdb4_purge, libdb3_purge - -libdb4_purge and libdb3_purge have been obsoleted by the new rewritten -dspam_clean tool - -[20040401.0720] jonz: extended group line length to 10k - -extended length of a single group line to 10k, from 1k - -[20040401.0720] jonz: new dspam_clean functionality - -dspam_clean has been rewritten to support the following different clean -operations: - -1. Using the -s flag, dspam_clean will continue to perform stale signature - purging. If an age is specified, for example -s14, the age defined as the - default will be overridden. Specifying an age of 0 will delete all - signatures for the users processed. - -2. Using the -p flag, dspam_clean will delete all tokens from a user's database - whose probability is between 0.35 and 0.65 (fairly neutral, useless tokens) - that fall beyond the default age. If an age is specified, for example - -p30, the age defined as the default will be overridden. It is a good - idea to use this type of clean with an age of 0 on users after a lot of - corpus training. - -3. Using the -u flag, dspam_clean will delete all unused tokens from a user's - database. There are four different types of unused tokens: - - - Tokens which have not been used for a long time - - Tokens which have a total hit count below 5 - - Tokens which have only one spam hit - - Tokens which have only one innocent hit - - Ages may be overridden by specifying a format such as -u30,15,10,10 - where each number represents the respective age. Specifying an age of - zero will delete all unused tokens in the category. - -Optionally, usernames may be specified to override the default behavior of -processing all users. - -Examples: - -Process all users on the system using all clean operations: - dspam_clean -s -p15 -u90,30,15,15 - -Delete all of user 'dick' and 'jane's signatures - dspam_clean -s0 dick jane - -Perform a post-corpus training clean on user 'spot' - dspam_clean -p0 -u0,0,0,0 - -Perform nightly maintenance using all default values, for all users, with all -options enabled: - dspam_clean -p -u -s - -NOTE: You may wish to only run certain cleaning modes depending on the type of -storage driver you are using. For example, the MySQL storage driver -includes a purge.sql script which performs signature and unused operations, -leaving only the probability operation as a useful operation. If you are -using a SQL-based storage driver, it is strongly recommended that you use -the maintenace scripts wherever possible. - -[20040401.0720] jonz: added _ds_delall_spamrecords and _ds_del_spamrecord - -added spamrecord deletion functionality to storage driver, increased version -to 5:0:0 - -[20040331.2000] jonz: applied some memory leak patches - -applied some memory leak patches submitted by -William Ahern - -[20040328.2200] jonz: renamed USERDIR to DSPAM_HOME - -all references to USERDIR are now known as DSPAM_HOME, including the ---with-dspam-home configure flag, and mode settings. - -[20040328.2200] jonz: moved several features to commandline - -many features have been REMOVED from the configure script and into the -commandline including chained tokens, bayesian noise reduction, automatic -whitelisting, and training modes. please see the documentation for a complete -list of commandline arguments. - -configure functions which have changed: - ---with-userdir-* changed all to dspam-home ---with-local-delivery-agent changed to --with-delivery-agent ---enable/disable-chained-tokens removed from configure ---enable/disable-bnr removed from configure ---enable/disable-whitelist removed from configure ---enable/disable-toe removed from configure ---enable/disable-tum removed from configure ---enable/disable-spam-delivery removed from configure ---enable-deliver-to-stdout removed from configure - -[20040328.1745] jonz: completely reworked commandline arguments - -please see documentation for new commandline arguments. - -[20040328.1745] jonz: removed free-pass of arguments by untrusted users - -removed ability to pass in arguments by untrusted users, when the file -untrusted.mailer_args didn't exist - -[20040327.2230] jonz: CGI to allow logo-click to return - -changed CGI to allow a click on the DSPAM logo to return the user to the -main page - -[20040327.2222] jonz: thresholds to include all totals - -thresholds changed to include all 3 totals: learned, classified, corpusfed - -[20040327.2221] jonz: test-conditional training threshold dropped - -test-conditional training threshold dropped to 1000 messages - -[20040326.0730] jonz: extended DAF flagset - -extended DAF flagset to four bytes - -[20040326.0730] jonz: temporarily removed blackbox framework - -archived and removed blackbox framework from cvs; not likely i'll be working -on it any time soon - -[20040325.2129] jonz: extended context flags to u_int32_t - -extended context flags to 4 bytes, to add additional commandline features - -[20040325.2129] jonz: compatibility fixes for TOE - -compatibility fixes for TOE for web client and stats - -[20040325.1939] jonz: code cleanup - -commented headers, cleaned up code - -[20040325.1930] jonz: converted total_spam, total_innocent - -converted total_spam, total_innocent to spam_learned, innocent_learned, and -added spam_classified, innocent_classified for stats use with TOE. - -NOTE: changes are required to SQL-based drivers for this version - -MySQL Example: - -alter table dspam_stats add spam_learned int; -alter table dspam_stats add innocent_learned int; -alter table dspam_stats add spam_classified int; -alter table dspam_stats add innocent_classified int; -update dspam_stats set spam_learned = total_spam; -update dspam_stats set innocent_learned = total_innocent; -update dspam_stats set spam_classified = 0; -update dspam_stats set innocent_classified = 0; -alter table dspam_stats drop column total_spam; -alter table dspam_stats drop column total_innocent; -alter table dspam_stats add spam_misclassified int; -alter table dspam_stats add innocent_misclassified int; -update dspam_stats set spam_misclassified = spam_misses; -update dspam_stats set innocent_misclassified = false_positives; -alter table dspam_stats drop column spam_misses; -alter table dspam_stats drop column false_positives; - -[20040325.1930] jonz: addspam to fail on failed signature retrieval - -due to a lot of misconfigurations of dspam, addspam will now fail if a -signature cannot be retrieved. this should help pinpoint problem installs -and clients, and prevent poor accuracy. - -Version 2.11.1 --------------- - -[20040325.0757] jonz: added --help - -added --help commandline argument - -[20040325.0757] jonz: fixed division by zero bug in dspam.cgi - -small chance of division by zero bug fixed - -[20040325.0740] jonz: fixed toe - -fixed toe, which has been accidentally disabled in testing - -[20040325.0740] jonz: provided runtime arguments for training mode - -added run-time arguments --toe --tum --teft to specify training mode. the -default is based on configure-time options. - -also added training_mode variable to dspam context, should not affect -compatibility. - -Version 2.10.2 --------------- - -[20040319.2138] jonz: added shell quoting of special characters - -special characters are now quoted, instead of filtered, when calling the LDA. - -Version 2.11.0 / Version 2.10.2 -------------------------------- - -[20040319.1845] jonz: fixed bash special characters problem - -fixed special characters problem in bash by encapsulating all arguments in -quotes - -[20040319.0730] jonz: added train-on-mature training option - ---enable-tum -train-on-mature (TuM) is a hybrid of train-everything and train-on-error. -all tokens are candidates for training as in train-everything, but only tokens -whose total number of "hits" don't exceed 100 are trained. on error, all -tokens are trained. this provides a good balance between the volatility of -train-everything and the lack of behavioral learning in train-on-error. it -also has the added benefit of not breaking the things that toe presently -breaks in dspam (whitelists, stats, etc). - -[20040319.0700] jonz: fixed source address bug - -fixed a bug in source address tracking where messages were reported as innocent -even if they were guilty, if the user had < 2500 messages in corpus - -[20040318.1932] jonz: fixed compile-time warning in dspam_tools.c - -fixed warning for uninitialized crc variable - -[20040318.0259] jonz: post-training features dropped to 2500 - -post-training features such as TOE and BNR have had their prerequisite ham count -droped from 4000 to 2500. - -[20040318.0241] jonz: fixed up headers so developers only need libdspam.h - -fixed up header dependencies so developers only need include libdspam.h to -use libdspam. - -[20040318.0124] jonz: added support for header-based signatures - -for implementations where a signature in the body is unacceptable, using ---enable-signature-headers will place the signature in the header, and not -in the body. - -IMPORTANT: This will -require- that the headers be forwarded with the message -when being reported as spam. This usually requires bouncing the message, -forwarding it as an attachment, or using a macro. The header will otherwise -be lost with standard forwarding. - -[20040316.2315] jonz: added support for userlist termination - -userlist can now be terminated using -- - -Version 2.10.1 --------------- - -[20040314.0128] jonz: bugfix for segfaults in dspam.c - -segfaults can occur on some systems (predominantly Solaris) when mail is sent -to multiple local recipients. bugfix required the header insert pointer to -be reset. - -Version 2.10.0 --------------- - -[20040307.1828] jonz: new dspam_corpus tool by Gary Funck - -replaced old dspam_corpus tool with a better one contributed by Gary Funck - - -[20040305.0320] jonz: added postfix documentation - -added documentation for postfix local delivery - -[20040305.0320] jonz: added support for domain filesystem structure - -use of --enable-domain-scale configures filesystem for domain-based -support. when used, username@domain should be passed in as the userid and -$USERDIR/domain/username/ will be used instead of $USERDIR/username or -$USERDIR/u/us/username as done with large scale - -[20040303.2208] jonz: applied bugfix patch by dennis pedersen - -applied a bugfix to libdb3 and libdb4 fixing a bug that was presented in rc2 -causing loop hangs. submitted by dennis pedersen - -[20040303.0243] jonz: added long username support - -by default, the username length uses the same limits as the operating system. -if --enable-long-usernames is specified, however, the limit will be set to -256. - -Version 2.10-rc2 ----------------- - -[20040302.0007] jonz: implemented auto-whitelisting - -implemented auto-whitelisting using --enable-whitelist function. automatic -whitelisting will automatically whitelist any full 'From' addresses (including -the name) that have appeared in at least 10 innocent messages and zero spams. -when a message is forwarded as a spam, any automatic whitelisting for that -address is permanently deactivated. - -[20040301.2339] jonz: fixed purge.sql - -fixed some bugs in MySQL's purge.sql, optimized for speed thanks to another -patch submitted by bob glamm. - -[20040229.1245] jonz: applied patch submitted by Sascha Blank - -applied patch submitted by Sascha Blank for dspam_dump to allow lookup of -individual tokens. - -[20040228.1618] jonz: train-on-error to perform source address tracking - -train-on-error mode fixed to perform source address tracking - -[20040224.2008] jonz: fixed high cpu utilization on large messages - -fixed an iteration problem which caused high cpu utilization on large (2MB+) -text messages - -[20040223.0350] jonz: fixed compile error in libdspam.c - -fixed compile error in libdspam.c when HAVE_ISO_VARARGS isn't defined - -Version 2.10-rc1 ----------------- - -[20040222.1606] jonz: added support for global groups - -global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box -filtering" for all new users until they have built their own useful -dictionaries. to create a global classification group, add something like -this to $USERDIR/group: - -groupname:classification:*globaluser - -This will automatically add globaluser as a classification peer to all users. -Any user who has less than 1000 innocent messages or 250 spam messages in -their corpus, or whose filter is uncertain about a particular message will -consult the global dictionary for an answer. - -global groups will need to be trained using corpus or other means, or by -using the dspam_merge tool. the global user (in this case 'globaluser') is -treated just as any other user on the system. - -[20040221.2155] jonz: format changes to dspam_dump - -dspam_dump formatting changes + display of token probability - -[20040220.1700] jonz: added quick fix for \r stripping in dspam_corpus - -added a quick fix to strip \r's in mailboxes when using dspam_corpus - -[20040220.1700] jonz: fixed segfault bug - -fixed a bug that caused DSPAM to segfault on empty MIME delimiters. This -generally only occured with spams, as legitimate messages have RFC-compliant -delimiters. - -[20040219.0150] jonz: added support for neural networking - -see README for more details - -[20040218.2300] jonz: added tweaking to BNR for small text samples - -added tweaking of thresholds to BNR for small text sampes < 3.5k - -[20040217.0724] jonz: fixed some miscellaneous compile warnings - -fixed some miscellaneous compile warnings. 2 for when trusted user security -is disabled, 1 for dspam_2mysql.c:126 - -Version 2.10-beta-2 -------------------- - -[20040214.1632] jonz: added TOE support - -added TOE (Train on Error) support using the --enable-toe configure function. -see the README file for more details. - -[20040213.1549] jonz: fixed X-DSPAM header duplication bug - -fixed a bug which caused X-DSPAM headers to be cumulatively appended when -a single message addresses multiple local users. - -[20040214.1327] jonz: added --enable-client-compression configure flag - -added option --enable-client-compression to use compression option between -data source and its clients (where available). presently only available with -the mysql_drv storage driver. you should enable this if the data source -is on a separate machine from the DSPAM agent(s), as it conserves bandwidth -at the expense of a few CPU cycles. - -[20040214.1258] jonz: created speed and space optimized MySQL scripts - -created both speed and space optimized mysql_objects.sql scripts. - -[20040214.1235] jonz: added new stats to CGI - -added FP stats + overall accuracy to CGI - -[20040214.1235] jonz: added debug output for noise filtering - -added noise level, spammy tokens, and eliminations to debug output - -Version 2.10-beta-1 -------------------- - -[20040212.2208] jonz: added stale data purge / PURGE_ANY - -added stale data purge to libdb3 and libdb4 purge tools. based on PURGE_ANY, -defined in config.h, any stale data is removed after six months. - -[20040212.2205] jonz: added DSF_NOISE flag - -added DSF_NOISE flag to libdspam interface for activating Bayesian Noise -Reduction. - -[20040211.0158] jonz: disabled mysql_drv _ds_delete_signature - -disabled _ds_delete_signature in mysql_drv due to errors; added signature -purge to purge.sql script. no longer necessary to run dspam_clean if using -the mysql storage driver. - -[20040211.0155] jonz: mysql_drv get_one update - -check to insure there was at least one token to be loaded, otherwise do not -perform query - -Version 2.9.6 -------------- -[20040208.1906] jonz: bugfix for BNR - -BUGFIX: when BNR is activated on users with < 4000 innocent -messages, the filter forgets to load token stats for the user and marks -all messages as innocent. - -Version 2.9.5 -------------- -[20040204.0413] jonz: implemented Bayesian Dolby - -implemented Bayesian Noise Reduction -(see http://www.nuclearelephant.com/projects/dspam/bnr.html) - -[20040202.2216] jonz: added multipart frequency threshholds - -body tokens in multipart messages now require a minimum frequency of 2 to be -included in the calculation. - -[20040128.2021] jonz: only report source-addresses in mature corpuses - -only report source-addresses when the user has >4000 innocent messages in -their corpus. - -Version 2.9.4 -------------- - -[20030128.0334] jonz: added DSPAM SBL dropfile support - -added support to source address tracking to drop SBL files to /var/spool/sbl -if exists, where client in directory watch mode can read. - -Version 2.9.3 -------------- - -[20040122.0700] jonz: hex decoding - -a small piece of code to perform hex-decoding on 8bit encodings. very useful, -although hex encoding is still somewhat rare. - -[20040121.0805] jonz: new stats watering-down code for high-spam users - -implemented new code for watering down statistcs during the learning phase to -compensate for users with a high percentage of spam. this should only affect -accuracy of normal (average spam) users for the first 1000 messages. -significant watering down takes place up to 1000 spams. limited watering -down takes place up to 2500 spams if the user has more spam in their corpus -than innocent mail. - -[20040121.0805] jonz: priority given to complex tokens - -slight code tweak to give priority to more complex tokens (e.g. chained -tokens) to help improve accuracy. - -[20030121.0805] jonz: signaure should not be stored when using --corpus - -signatures are no longer stored when using the --corpus flag - -Version 2.9.1 ------------- - -[20031220.1442] jonz: added notification emails - -three different notification emails can be configured to get sent: - -- to a user the first time they receive a message through dspam (first run) -- to a user the first time a spam is caught through dspam on their behalf -- to a user when their quarantine box is > 2MB in size - -to use notification emails, copy the txt/ directory from the distribution -into USERDIR and configure the emails accordingly. more information is -available in the README. - -Version 2.8.1 -------------- - -[20031205.0821] jonz: html preformatting only for html parts - -html preformatting to be done only to html parts; html comments in -plain text parts should not be filtered out. - -[20031205.0156] jonz: high-byte tokens not ignored - -fixed a small bug causing tokens consisting of all high-bytes to be -ignored. - -[20031205.0122] jonz: tweaked cgi spam ratio - -tweaked cgi spam ratio to include misclassificatoins - -[20031130.1016] jonz: dspam_merge to corpusfy totals - -dspam_merge now moves all totals to corpusfed, so that a merged user can -easily start with fresh stats. - -[20031129.1619] jonz: fixed quarantine agent arg skip bug - -fixed minor bug which caused some arguments to be skipped then using a custom -quarantine agent - -[20031129.1443] jonz: implemented opt-in/opt-out storage directory - -moved all user.dspam and user.nodspam files to USERDIR/opt-in and -USERDIR/opt-out, respectively. this saves from needing to have and set up -a directory for each user. - -Version 2.8 ------------ - -[20031126.1633] jonz: stepped down insert query error to debug info - -stepped down the query error on insert down to debug info, as it is a common -occurance on busy servers. - -[20031124.0523] jonz: corrected buffer overrun in BDB drivers - -corrected buffer overrun vulnerability in BDB drivers dealing with copying -tokens into memory. discovered when working with corrupt dictionaries which -caused segfaults. the dictionary would have to be manipulated in order to -exploit, so risk was minimal. - -[20031124.0459] jonz: fixed bug in dspam_2mysql - -dspam_2mysql failed to place quotes around token value. - -[20031123.1351] jonz: fixed libdb4,libdb3 shared group bug - -fixed a bug that caused shared groups to fail with the following error: - -DB_ENV->open failed: No such file or directory - -[20031120.0405] jonz: fixed HTML boundary corruption with signature removal - -fixed a bug that caused boundary corruption after an HTML part where a DSPAM -signature from a previous reply was removed by the agent. - -[20031120.0405] jonz: do not remove old signatures from signed messages - -corrected the dspam agent so that older signatures from signed messages were -not parsed out. this caused the message to fail to authenticate. - -Version 2.8-rc-1 ----------------- - -[20031115.2042] jonz: fixed minor memory leak on initialization failure - -minor memory leak caused in libdspam when dspam_init fails. does not affect -DSPAM agent, only library. - -[20031115.2042] jonz: DSM_CLASSIFY generated truncated signatures - -fixed a bug where DSM_CLASSIFY generated truncated signatures - -[20031115.1540] jonz: corrected multipart analysis bug - -corrected a bug that caused parts of a multipart message that were not -specifically marked as text with the "Content-Type" header to be ignored from -analysis. - -[20031114.1949] jonz: corrected DSM_CLASSIFY in-memory totals bug - -corrected a bug that changed in-memory totals when DSM_CLASSIFY was used - -[20031113.1938] jonz: corrected DSM_CLASSIFY bug in libdspam - -corrected two bugs in libdspam regarding the DSM_CLASSIFY mode: - -1. CTX->signature would overwrite the provided signature with a new signature - resulting in a potential memory leak - -2. If no signature was provided, DSM_CLASSIFY would segfault instead of create - a new signature - -Version 2.8-beta-2 ------------------- - -[20031103.1119] awn: libdspam version changed to the '4:0:0' - -libdspam version changed to the '4:0:0' because introducing and -requiring of dspam_init_driver() at start and dspam_shutdown_driver() at -and is backward incompatible change. - -[20031031.0402] jonz: fixed web stats for shared groups - -shared group webstats fixed - -[20031031.0340] jonz: added commandline options - -added --stdout commandline option to deliver messages to stdout -added --deliver-spam commandline option to deliver spams to user's mailbox -changed --deliver flag to --deliver-fp, although --deliver still supported - for backward compatibility. option still only necessary when configuring - with --enable-spam-delivery - -[20031031.0324] jonz: changed default configure options - -enabled the following as defaults in configure: - -alternative-bayesian (alternative Bayesian algorithm) -test-conditional (test-conditional, iterative based training) - -[20031030.1120] jonz: fixed caching bug - -fixed caching bug in mysql_drv driver and ora_drv drivers causing dspam_stats -to return stats for first user, as stats for all users - -[20031029.0538] jonz: added --classify commandline flag - -the --classify commandline flag will classify the input message and output -to stdout "SPAM" or "HAM" depending on the result. No changes will be made -to the user's tokens or totals. - -[20031029.0538] jonz: changed totals mechanism - -the following changes have been made to the totals mechanism: - -- spam_misses has been changed to spam_misclassified -- false_positives has been changed to innocent_misclassified -- spam_corpusfed and innocent_corpusfed have been added - -IMPORTANT UPGRADE NOTE: Please see the README for information on updating your -SQL databases to accept these changes if you are using a SQL-based driver. If -you are using a BDB-based driver, these changes will automatically be -implemented. - -[20031028.2000] jonz: corrected CLASSIFY bug in mysql_drv and ora_drv - -corrected a significant bug in mysql_drv and ora_drv which caused tokens and -totals to be incremented on all CLASSIFY calls. - -[20031028.2000] jonz: changed DSF_CLASSIFY (flag) to DSM_CLASSIFY (mode) - -the DSF_CLASSIFY flag is now a mode called DSM_CLASSIFY. - -Version 2.8-beta-1 ------------------- - -[20031028.0531] jonz: added customizable header for cgi - -cgi spam account now has customizable header - -[20031028.0448] jonz: classification catches to add as spam - -spam catches by a member of a classification group should result in the -message being added as spam, as opposed to innocent. this has been corrected. - -[20031028.0204] jonz: X-DSPAM-User header only considered in managed groups - -the X-DSPAM-User header field is only paid attention to when the user is -a member of a managed group (the only time where the original user is -necessary). - -the parsing of the X-DSPAM-User header has also been corrected to chomp the -newline character, which was resulting in some systems including the character -in the username. - -[20031028.0116] jonz: corrected a critical error in classification groups - -corrected a critical error in classification groups causing DSPAM to crash -(and the message get delivered by the MTA's failsafe in most cases) when a -user in a classification group resulted in a spam being caught. - -[20031027.0137] jonz: added mta whitelists for source address tracking - -file USERDIR/mta.whitelist may now contain a list of internal MTA ip addresses, -which will cause DSPAM to skip to the next 'Received' header when processing -the source address. each IP should be on a newline. - -[20031026.1706] jonz: added signal handling to tools - -added signal handling to tools, to unlock databases upon SIGINT, SIGPIPE or -SIGTERM to avoid stale locks. - -[20031025.1111] jonz: added rolling filter accuracy stats to cgi - -rolling filter accuracy stats allows the user to measure their filtering -accuracy over a period of time (usually monthly or quarterly). stats should -be reset after a good learning period (approximately 4000 spams and nonspams) -to measure accuracy accurately =) - -[20031024.0007] jonz: libdb drivers reworked - -libdb drivers reworked for better: -- locking (exclusive) -- recovery (simple recovery run on open) -- environment management (individual user environments) - -IMPORTANT UPGRADE NOTE: - -run the script 'dspam_movefiles [userdir]' in the tools directory to upgrade to -this new directory storage format. after running, make sure you chown the -correct file ownership to the newly created directories. this should be done -with the MTA shut down and no dspam processes running. - -you will also need to reinstall/reconfigure the CGI - -[20031023.1949] jonz: update to cgi to avoid missed messages - -cgi now tracks the size of the quarantine between viewing and deleting all -messages, to avoid deleting messages that came in while reviewing the -quarantine. - -[20031023.1727] jonz: compensated for converged boundaries - -compensated for a slight break of RFC where two boundaries in a nested -message appear without a blank space in-between, leading to message corruption. -fortunatley, this type of behavior is extremely scarce. - -[20031023.0900] jonz: fixed classification group bug - -fixed a bug that caused classification groups never to fire; datatype -CTX->confidence should be float, not int. - -[20031022.2229] jonz: added "-d %u" to default cgi flags - -added "-d %u" to default dspam cgi flags to assist new users - -[20031022.0930] jonz: fixed bug preventing multiple group subscriptions - -fixed a bug that caused a user to not be able to be subscribed to multiple -groups - -Version 2.7.6.10 ----------------- - -[20031022.0930] jonz: added support for managed shared groups - -the group type 'shared' can be appended with ',managed' to convert the shared -group into a managed shared group. a managed shared group is the same as a -shared group, only the managed version will share the quarantine box as well, -enabling one user (named after the group) to manage the handling of all -quarantine functions (false positive reporting, etc.). - -this is generally not what users want, as personal information could potentially -be shared with the administrator of the group, however there are some -circumstances where this would be appropriate. - -a regular shared group: - -groupname:shared:user1,user2,userN - -a managed shared group: - -groupname:shared,managed:user1,user2,userN - -[20031022.0930] jonz: corrected long-time stdin bug - -corrected a long-time, just discovered but that caused stdin to be read in very -small chunks (32 bytes each). correcting this bug has caused DSPAM to read -in messages much quicker. - -[20031022.0930] jonz: cgi to use X-DSPAM-Signature - -when message-id is not present, the cgi will now use the X-DSPAM-Signature -field to uniquely identify each message. - -[20031022.0930] jonz: extended header assembly buffer to 4k - -header assembly buffer extended to 4k; was truncating some longer fields at 1k. - -[20031022.0930] jonz: minor crash bugfix - -an obscure bug has been corrected which caused dspam to crash if the word -"boundary" was placed on a line in the message body, and that line began -with a space or tab. - -[20031022.0900] jonz: false positives not delivered when spam-delivery enabled - -false positives shouldn't be delivered when --enable-spam-delivery is enabled, -since they will be mailed in (or otherwise processed) directly from the user's -inbox. - -to force false positives to be delivered, use the --deliver commandline -argument - -Version 2.7.6.9 ---------------- - -[20031021.1300] jonz: significant changes to mysql driver - -the data type for the 'token' field in the dspam_token_data table has been -changed from BIGINT to VARCHAR. This is due to a bug in MySQL being unable to -handle some of the large numeric values used for tokens. - -BEFORE UPGRADING, SHUT DOWN YOUR MTA AND ISSUE THE FOLLOWING MYSQL QUERY: - -alter table dspam_token_data modify token varchar(32); - -[20031021.1206] awn: Convenience symlinks for libdb{3,4}_deadlock - -Convenience symlinks dspam_deadlock.libdb4 (in case of libdb4_drv), -dspam_deadlock.libdb3 (in case of libdb3_drv) and dspam_deadlock (in -case of both libdb*_drv) are added and pointed to the appropriate -libdb{3,4}_deadlock binary. - -[20031021.1016] awn: configure: mysql and network-related libraries - --lnsl and -lsocket are added to the mysql client library check where -needed (e.g. on Solaris). - -[20031021.0000] jonz: changed signature format to include frequency - -WARNING: You should delete all your temporary signature information before -upgrading to this version, as the signature format has changed. You can do -this by deleting all your .sig files or issuing a -"delete from dspam_signature_data" query if using a SQL-based driver. - -RATIONALE: When performing classification queries with signatures, the -frequency is necessary to insure an identical calculation. - -[20031021.0000] jonz: added support for 'CLASSIFICATION' group - -A 'CLASSIFICATION' group type has been added. Classify groups are groups of -users who share the results of spams against their own personal dictionaries. -This means that for every message that comes in for any user in the group, -dspam classifies that message for every user and if any user believes the -message to be spam, it is marked as spam for the destination user. - -To avoid false positives, external classification is only used when there is -a confidence level of 0.30 or higher of spam. The confidence level is -calculated with Chi-Square. - -Members of this type of group should only join after their initial training -period. Members may also be part of an inoculation group, but users can -not be a part of both a classify group and a shared group. - -[20031021.0000] jonz: changed default probability for single-corpus tokens - -changed the probability for tokens that appear only in one corpus: - -TYPE FROM TO -Appears +10 in Spam .9901 .9999 -Appears <10 in Spam .9900 .9998 -Appears +10 in Innocent .0099 .0001 -Appears <10 in Innocent .0100 .0002 - -[20031019.2200] jonz: added test-conditional training support - -added configure flag --enable-test-conditional which will enable test- -conditional training. test-conditional tranining will automatically re-train -the user's dictionary on spam or false positive until the message condition is -met (e.g. until the user's dictionary no longer results in misclassification of -the message being retrained). this training has a maximum number of 5 -iterations, and will only invoke when: - -- The user has > 4000 innocent messages in their corpus, and is reporting - a spam - -- The user is reporting a false positive (regardless of the number of -messages in their corpus) - -[20031019.2016] jonz: added support for shared groups in mysql_drv driver - -support has been added for shared groups using the mysql_drv driver, but with -one caveat: if you will NOT be enabling "virtual users" support, you will need -to create a user on your system for each group you add. This is because the -mysql_drv driver maps user ids in the database to users on the system. this -is not an issue when "virtual users" support is enabled. - -Version 2.7.6.8 ---------------- - -[20031019.1722] jonz: added mysql.sock functionality - -added functionality for connecting via mysql.sock instead of TCP. specify -pathname to socket in lieu of hostname to implement. - -[20031019.1700] jonz: eliminated false-positive retrain headers - -eliminated the additional X-DSPAM headers added when reclassifying a -false positive. the headers from the original classification are -preserved. - -[20031019.1530] jonz: centralized syslog logging of mysql query errors - -centralized/standardized syslog logging of all mysql query errors - -[20031019.1530] jonz: corrected bug in virtual users w/mysql - -corrected a bug causing some tools to fail when virtual users is enabled while -using the mysql_drv driver. - -[20031018.1050] jonz: corrected type-o in dspam_corpus.in - -fixed close(PIPIE) type-o in dspam_corpus.in - -Version 2.7.6.7 ---------------- - -[20031017.2230] jonz: enhanced overall inoculation processing - -code cleanup of inoculation processing; one central subroutine. fixed some -minor related bugs. - -[20031017.2129] jonz: corrected external inoculation processing - -external inoculations (--corpus --inoculate --addspam combination) resulted in -an error causing the user to never be inoculated, however all users in the -inoculation group were. corrected this bug so that the destination user would -also be inoculated. - -Version 2.7.6.6 ---------------- - -[20031017.1930] jonz: fixed bugs in CGI 'From' line reporting - -fixed a bug that caused malformatting in the 'Fron' line when placing in spam -quarantine - -[20031017.1930] jonz: fixed bugs in false positive processing - -fixed a bug, which now strips out any quarantine message 'From' line added by -DSPAM prior to processing. - -[20031017.1930] jonz: fixed variable definition problems with experimental code - -fixed bugs in experimental code; should not affect normal users, but broke -the build anyway. - -Version 2.7.6.5 ---------------- - -[20031017.1730] jonz: added --enable-experimental - -added --enable-experimental flag which activates experimental code, moved -the following code bases to experimental: - -- Versatile Language Message Inoculation Format - (standard for sending/receiving inoculations across multiple anti-spam - platforms and systems) - -- Counting of unknown tokens in messages - -[20031017.1700] jonz: only inoculate users who require inoculation - -inoculation now only inoculates users who would otherwise have misclassified -the message being presented - -[20031017.1600] jonz: changed all /tmp files to USERDIR - -all /tmp files now outputted to USERDIR to avoid a race condition. - -[20031016.2207] awn: libdb detection is changed again (sigh) - -Probing for -ldb- and -ldb is resurrected again (needed -for some version of Debian with libdb v3.2.9). Difference from previous -one is using libtool for linking test frogram at the "header- -vs. library version" check stage. - -[20031016.1837] jonz: changed high characters to 'z' instead of ignored - -changed all high characters to z's; previously ignored them. effective way to -improve filter rate on spams using wide characters. credit for this technique -given to Brian Burton. - -[20031016.1400] jonz: added warning about MySQL bug to README - -added information about the bug in MySQL versions < 4.0.15.stable to the -MySQL README. - -[20031016.1227] jonz: compensated for mysql_drv insert bug - -compensated for mysql_drv insert bug; made better code in both mysql_drv and -ora_drv to handle insert failures with more grace - -[20031016.1142] jonz: corrected token insert debug output - -corrected debug output for token inserts to display correct query and disk -state. - -Version 2.7.6.4 ---------------- - -[20031016.0946] jonz: switched to MyISAM MySQL tables - -InnoDB turned out to be much slower than MyISAM, so all MySQL objects have -been changed to be of type "MyISAM". - -[20031015.1434] jonz: added exit code mirroring of LDA - -added exit code mirroring of LDA; if any calls to LDA fail, dspam will return -the last failed exit code - -[20031015.1045] jonz: added caching of getpwnam() and getpwuid() information - -added caching of getpwnam() and getpwuid() information for non-virtual users -(already caches for virtual users). this was added to keep some tools from -hammering on LDAP or other local authentication mechanisms. - -Version 2.7.6.3 ---------------- - -[20031014.2211] jonz: fixed 100% cpu utilization bug in libdbX_deadlock - -fixed a bug in libdbX_deadlock causing 100% cpu utilization on linux - -[20031014.1935] jonz: fixed auto-recovery in libdb drivers - -fixed bugs in auto-recovery mechanism in libdb drivers - -[20031014.1545] jonz: added support for accepting inoculation messages - -Added support for "Inoculation Message Format", a new standard which -is currently in the form of an Internet-Draft, to allow inoculation -via email and trusted checksums. - -[20031014.0824] jonz: added X-DSPAM-Signature - -X-DSPAM-Signature is NOT a replacement for having in-line signatures -but is useful for debugging purposes - -[20031014.0842] jonz: enhanced boundary recognition - -enhanced boundary recognition to catch boundaries with malformatted -definition lines - -[20031013.2217] jonz: fixed bug in dspam_2mysql - -fixed type-o in 'false-positives' field to false_positives - -[20031013.1949] jonz: better html filtering - -implemented better filtering of some useless html tag data, focus more on -content; resulted in the catching of a few more spams - -[20031013.1832] jonz: added --inoculate flag - -added support for inoculation using --inoculate flag. this can be used in -conjunction with external inoculation as described in the README file. - -Version 2.7.6.2 ---------------- - -[20031013.1443] jonz: fixed algorithm initialization bug - -fixed a bug in the initialization of algorithm data, which caused some -miscalculations whenever the first token was very innocent. - -[20031013.1413] jonz: changed token sorting algorithm - -token sorting now sorts by delta first, then by frequency; this means -tiebreakers will be based in part on token frequency - -[20031013.1329] jonz: added deadlock detection tool - -for large-volume implementations, added a deadlock detection tool, -libdb3_deadlock or libdb4_deadlock. this tool can be run at system start and -will continue to perform deadlock operations in the background. - -[20031013.1317] jonz: implemented deadlock detection - -Implemented calls to libdb's deadlock detection mechanism - -[20031013.1250] jonz: modified Chi-Square algorithm for better performance - -Chi-Square algorithm changed to use 25 tokens, ignoring mid-range - -[20031012.1831] jonz: changed group file format, added inoculation type - -changed group format to: - -groupname:grouptype:user1,user2,userN - -BE SURE TO UPDATE IN YOUR GROUP FILE - -there are now two types of groups: shared and inoculation. the shared group -is the group everyone is used to, sharing dictionaries and signature dbs. - -the inoculation group allows each member of the group to maintain their own -private dictionary and signature database, but members of the group will -automatically train eachother's dictionaries with spams they manually forward in -which will help 'inoculate' all other group members from new spams going out. - -examples: - -development:shared:bob,tom,bill - -company:inoculation:jim,ted,robert - -a user can be a member of multiple inoculation groups, but cannot be a member -of both a shared group and an inoculation group. - -[20031012.0009] jonz: fixed freed-memory bug in decode.c - -fixed freed-memory bug in deocde.c, which caused an occasional crash when -decoding encoded headers. - -Version 2.7.6.1 ---------------- - -[20031011.1236] jonz: added support for multiple algorithms - -added support for multiple algorithms; e.g. if any of the enabled algorithms -suspect the message is spam, it is spam. you can use the following flags: - ---enable-chi-square ---enable-alternative-bayesian ---disable-traditional-bayesian - -traditional bayesian is enabled by default - -[20031011.1034] jonz: added Chi-Square specific per-token calculations - -when using Chi-Square, added Chi-Square's expanded per-token calculations - -[20031011.0923] jonz: fixed alternative bayesian calculations - -fixed problem with the wrong definition names being used, which caused -alternative bayesian never to get invoked - -[20031011.0923] jonz: fixed a bug in all calculations - -a bug in 2.7.6 was fixed which resulted in spams to be missed if there were -fewer than 15 tokens available for calculation. this could only occur in the -most rarest of circumstances, so it should not have affected much. - -Version 2.7.6 -------------- - -[20031008.2200] jonz: added alternative calculation modes - -added --enable-alternative-bayesian flag which invokes Brian Burton's -alternative Bayesian algorithm - -added --enable-chi-square flag which invokes Chi-Square algorithm - -only one or neither (for default bayesian) flags should be used. debug -information for all three calculations is generated regardless. - -[20031008.2029] jonz: fixed bug in libdb drivers - -fixed a bug which used memory that had already been freed causing -some occasional unpredictible behavior. - -[20031008.1431] jonz: added support for multipart/signed messages - -added support for multipart/signed messages without altering message body. -signature is appended as a text attachment. - -[20031007.1904] jonz: fixed bug in boundary detection - -fixed a bug in boundary detection where boundary would fail to be detected if -it wasn't the first definition on the Content-Type heading. For example: - -Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; - boundary="------------ms010307080208090601090900" - -would have failed. this bug fix also improves overall boundary detection. - -[20031007.1724] jonz: added source address reporting - -the source address for all messages are now reported via syslog. this uses -the new dspam_getsource() function added to the API. depending on whether the -message is spam or innocent, the message will be reported either to MAIL.INFO -or MAIL.DEBUG. for example: - -dspam[30965]: spam detected from X.X.X.X - -dspam[30414]: innocent message from X.X.X.X - -this can be used for creating automatic blacklists. more to come. - -[20031007.1557] awn: configure script changes - -Configure script now detects version of libdb headers and guesses -appropriate library name from this version. Probed libraries are: - - -ldb-.minor> - -ldb - -As consequence and for example, no symlinking libdb41.so to the libdb-4.so is required now on FreeBSD. - -Version 2.7.5 -------------- - -[20031007.0930] jonz: date field no longer ignored - -date field is no longer ignored; time of day can sometimes play an effective -role in identifying spam or preventing false positives. - -[20031006.1911] jonz: Oracle storage driver - -first release of ora_drv; storage driver for Oracle. please see README file -for more information. - -[20031004.1423] awn: support for program-name transformation. - -Configure options `--program-prefix', `--program-suffix' and -`--program-transform-name' are fully supported now except CGI. -(Was: dspam_corpus and dspam_genaliases don't honor transformed name of -dspam binary). - -[20031003.1832] jonz: fix for base64-encoded binary messages - -bug fixed which caused corruption in some base64-encoded single-part -messages in which the only component was a binary file. - -[20031003.0031] jonz: automatic recovery for libdb drivers - -automatic recovery has been implemented for libdb drivers - -[20031003.0031] jonz: DB_ENV implemented for libdb drivers - -DB_ENV locking has been implemented for libdb drivers. This obsoletes -storage driver dot-lock file locking, which is no longer used. quarantine -dot-lockfile locking is still used when writing to the quarantine. - -Version 2.7.4 -------------- - -[20031002.1728] jonz: modified corpus flag to force results - -use of corpus flag now forces results to match commandline flags, meaning -innocent messages no longer need to be fed in first. - -[20031002.0800] jonz: added unique id to dspam_ngstats - -for systems without a static public ip address, a unique id can be configured -in dspam_ngstats.c (NGSTATS_UID) comprised of alphanumeric characters, periods, -and underscores. any invalid characters will cause stats to be ignored. - -[20031002.0800] jonz: removed broken sanity checks - -some sanity checks were firing off erroneous messages in 2.7.3; these have -been removed - -[20031001.0800] jonz: fixed --enable-large-scale with mysql_drv - -modified all drivers to add support for --enable-large-scale with mysql_drv - -[20031001.0800] jonz: added dspam_ngstats - -added dspam_ngstats, a global stats reporting tool designed for global -stats tracking for dspam - -[20030930.1547] awn: Convenience symlinks for libdb{3,4}_purge - -IMHO, `libdb3_purge' and `libdb4_purge' are not a very descriptive names. -Therefore, 2 convenience symlinks are added: - o dspam_purge.libdb4 (dspam_purge.libdb3 in case of libdb3 driver), and - o dspam_purge -both pointed to the appropriate libdb{3,4}_purge. - -[20030930.1517] jonz: fixed problem with trailing commas in update command - -Version 2.7.3 -------------- - -[20030929.1450] jonz: fixed problem with groups - -groups has been repaired; apparently a line of code was inadvertantly deleted -from the source tree causing it to fail in 2.7.2. - -[20030928.0253] awn: New scheme for conditional compilation of storage drivers - -All following is for `configure.ac' and resulting `configure' script: - - Now configure doesn't assume that storage driver sources are have - name `${storage_drv}.c' and `${storage_drv}.h' - - You need to list resulting .lo files in the `${storage_drv_objects}' - variable instead. - - Storage driver specific subdirectories are should be listed in the - `${storage_drv_subdirs}' variable also. - -This allows to have any number (including zero) driver-specific sources -and subdirectories, build automatically driver specific tools in these -directories (like `libdb4_purge') and should work properly in the VPATH -environment. - -[20030928.0248] awn: configure.ac bug fix - -Fix CPPFLAGS related bugs in the storage drivers sections of -`configure.ac'. - -All three storage sections in the configure.ac was have code like - CPPFLAGS="$DB_LIBS $CPPFLAGS" -instead of - CPPFLAGS="$DB_CPPFLAGS $CPPFLAGS" -(replace DB_ by MYSQL for give mysql case). - -This was my bug, I know. - -[20030927.1600] jonz: added docs for Courier MTA - -added documentation for configuring Courier MTA with DSPAM. contributed by -Michael Greb. - -Version 2.7.2 -------------- - -[20030925.2231] jonz: added --disable-trusted-user-security - -added configure flag --disable-trusted-user-security to disable trusted user -security, rather than trying to maintain two different versions of dspam. - -[20030925.1103] jonz: added support for RedHat's built-in libdb4.0 - -added support for RedHat's built-in libdb-4.0. This should also provide -compatibility with any other libdb-4.0. An alias will still be necessary: - -ln -s /usr/lib/libdb-4.0.so /usr/lib/libdb-4.so - -[20030925.1103] jonz: removed -d $u from default LDA configuration - --d $u coming first in the argument list caused some problems; -d %u should now -be used instead in the MTA configuration. - -[20030925.1103] jonz: patch to compensate for yahoo broken RFC bug - -implemented patch to compensate for a bug in the yahoo client where yahoo -breaks RFC and writes an end boundary prematurely, causing the real boundary -to get corrupted. - -[20030925.0855] jonz: changed compile flag --enable-virtual-uids - -changed compile flag --enable-virtual-uids to --enable-virtual-users - -[20030925.0852] jonz: fixed plain text html signature placement bug - -fixed a small bug that caused DSPAM to place the signature in html code samples -in plain text. - -[20030924.0000] jonz: added support for virtual users - -added support for virtual users in mysql_drv. this is necessary when the -users don't actually exist on the system. use --enable-virtual-users to -enable. only necessary when using the mysql storage driver. - -[20030923.2043] jonz: fix for multiple user bug - -restored %u and adjusted docs for multiple local user bug with sendmail - -Version 2.7.1 -------------- - -[20030923.0050] jonz: fixes for libdb tools - -several small fixes to issues with compiling libdb tools - -[20030923.0045] jonz: bug fix for header decoding - -fixed a bug causing some headers to decode incorrectly - -[20030923.0030] jonz: bug fix for attachments and signature - -added code to specifically NOT append a signature to any segments that have -"Content-Disposition" of type attachment. - -[20030922.1900] jonz: added more debug output - -added more debug output (on error) to mysql driver and libdspam - -[20030920.0840] jonz: mysql_drv to use -lm -lz - -switched mysql_drv to use -lm -lz in place of -lcrypto. both apparently have -compress/uncompress functions - -Version 2.7 ------------ - -[20030919.0900] jonz: added dspam_merge tool - -Version 2.7.beta.3 ------------------- - -[20030915.0000] jonz: added mysql_drv storage driver - -mysql_drv storage driver added for MySQL functionality. please see README -and tools.mysql_drv for more information. - -[20030914.1410] jonz: fixed bug in innocent_hits - -fixed bug where some tokens received 2 innocent hits instead of 1 (apparently -is an old but but did not dramatically affect effectiveness) - -[20030913.0956] jonz: implemented quarantine locking - -implemented quarantine locking mechanism independent of driver locking - -[20030913.0900] jonz: internalized API locking - -all API locking performed internally (driver-specific). no external locking -calls exist; part of _ds_init_storage and _ds_shutdown_storage. reason: -not all drivers will require context locking (and hopefully someday neither -will libdb3/libdb4 drivers). - -[20030912.0000] jonz: locks to use USERDIR - -for driver compatibility, all .lock file locking takes place in USERDIR, even -for large-scale implementations - -[20030911.0000] jonz: driver config script management - -implemented driver configure script management and tools.[driver] for -driver-specific tools. - -Version 2.7.beta.2 ------------------- - -[20030910.0054] jonz: message header decoding - -added message header decoding per RFC 2047 - -[20030909.1830] jonz: implmented standardized return codes - -implemented standardized return codes for the major api functions: -EINVAL, EFAILURE, ELOCk, EFILE, EUNKNOWN - -[20030909.1730] jonz: ported all tools to new driver API - -ported all tools to new driver API. dspam_purge has been replaced with -a driver-specific purge mechanism (default: libdb4_purge), due to the fact -that not all drivers will need to purge, and recreating datafiles is a very -specific function...still uses the storage driver api's locking mechanism. - -[20030909.0051] jonz: removed dspam_convert - -removed dspam_convert tool for 2.5->2.6 upgrades - -[20030909.0051] awn: configure script changes - -`--enable-gcc-warnings' configure option is added. - -[20030908.2000] jonz: implemented storage driver API - -implemented storage driver api. default driver is libdb4_drv - -[20030907.1627] awn: dspam_genaliases changes - -dspam_genaliases now generates `nospam-USER' aliases (aliases for false -positive reporting) by explicitly request only. New `--nospam' command -line option is used for this. - -Version 2.7.beta.1 ------------------- - -[20030907.1140] jonz: user identification and passthru changes - -the method of user identification and passthru has been changed: - - - DSPAM no longer recognizes -d to identify the user, but instead --user - must be used. --user will never be passed onto the local delivery agent. - - - In order to pass the -d flag through to the local delivery agent, it - must be specified either separately on the commandline, or at configure - time. - - - To allow -d flag support to be supported at configure time (and when - overriding untrusted users), the $u variable has been added to dspam. - any commandline arguments passed through DSPAM matching $u will be - replaced with the actual destination username (specified with --user - or automatically forced for untrusted users). - -These changes require some modifications to the mailer configuration. In the -following example for sendmail, you would change the following line in -the Mlocal block: - -A=/usr/local/bin/dspam -d $u - -to: - -A=/usr/local/bin/dspam --user $u -d $u - ---user is not passed through to the LDA, but -d is. Alternatively, you could -remove '-d $u' from sendmail.cf, and configure dspam with: - ---with-local-delivery-agent="/path/to/lda -d \$u" - -NOTE: be sure to escape the $ in $u ONLY when specifying it on the commandline. -This will prevent $u from being overwritten with the shell's environment -variable 'u'. - -Specifying this at configure time is especially useful if you plan on running -dspam via commandline and do not want to have to specify -d [username] in -addition to your --user [username] arguments. - -[20030907.1440] jonz: removed --deliver-cmd and --quarantine-cmd - -removed runtime --deliver-cmd and --quarantine-cmd functions; added configure -time --with-quarantine-agent="/path/to/agent" to override default quarantine -function. - -[20030906.0000] jonz: fix for boundary definition identification - -fix to detect non-lowercase multipart boundary definitions - -[20030906.0000] jonz: partial rewrite of internal sorting routines - -partial rewrite of tbt sort routines to drop recursion and potential stack -problems to follow. problems only experienced when using API with -multithreaded code. original patch submitted by Stuart Gathman - - -[20030906.0000] jonz: forced --deliver-cmd and --quarantine-cmd to require -trusted user permissions. dspam also must be compiled with ---enable-insecure-functions for them to be available. - -[20030906.0000] jonz: trusted user implementation - -implemented trusted user approach with user and passthru overrides for the -untrusted users. see README for more information - -Version 2.6.5.2 ---------------- - -[20030906.0000] jonz: insecure parameter check - -insecure parameter check; checks parameters for insecure characters: -| ; < > ` - -Version 2.6.5.1 ---------------- - -[20030905.1105] jonz: partitioned insecure functions - -partitioned potentially insecure functions to require the configure flag ---enable-insecure-functions to be set to activate. these include: - ---deliver-cmd ---quarantine-cmd - -special attention needs to be given to the execution permissions of the dspam -agent when enabling these functions to avoid users being able to -execute arbitrary commands on the server. it should be understood that these -are potentially insecure functions and could potentially lead to the execution -of arbitrary code if exploited by a malicious user or CGI. - -[20030905.0418] jonz: fixed bug: from header corruption - -if MTA is passing in From headers, they were being corrupted by DSPAM's -header parsing. fixed to specifically parse From headers differently - -[20030904.1422] jonz: fixed bug with quoted-printable debugging - -fixed a small bug that would fail to decode a quoted character immediately -following a line break - -[20030904.1127] awn: c89 compatiblity - -C89 compatiblity patch is applied. Patch author: Albert Chin-A-Young - - - * configure.ac, base64.c, decode.cn dspam.c, error.c, - error.h, libdspam.c, localdb.c, lock.c, signature.c, - tools/dspam_dump.c: Allow building with a C89 compiler - which does not have ISO varargs. - -[20030904.1046] awn: work around Solaris' make - -tools/Makefile.am doesn't uses $< authomatic variable because Solaris -make (at least some versions) doesn't supports its. - -[20030904.0700] jonz: segfaulting on _ds_message_destroy - -fixed a bug where destroying CTX->message caused a segfault. fortunately, this -bug would have never been reached by the agent or the api. - -[20030904.0700] jonz: nfs locking - -modified lock.c to work over nfs mounts, only checking pid when hostname -matches. maximum 20-minute stale lock removal. - -[20030903.1716] awn: dspam_corpus and dspam_genaliases update - -dspam_corpus and dspam_genaliases are use real path to the dspam binary -instead of assuming default /usr/local/bin/dspam. - -dspam_genaliases outputs aliases table to the stdout now by default. -Use new `-o filename' or `--output filename' option for redirect its to -the file. - -dspam_genaliases generates `nospam-USER' aliases in addition to the -`spam-USER' aliases now. - -[20030903.0145] jonz: fixed memory leak in dspam agent - -fixed internal memory leak in dspam agent where CTX->message was not destroyed. -only leaked until dspam agent exited, then memory was reclaimed - -[20030903.0145] jonz: updated example.c - -updated example.c to show correct CTX->message destruction - -[20030903.0115] jonz: fixed bug in false positive reporting - -fixed bug where innocent_hits incremented twice on false positive report - -Version 2.6.5 -------------- - -[20030902.0000] jonz: added --version commandline parameter - -added --version commandline parameter to display version; -v is not used as -it could be a passthru parameter to an LDA. - -[20030902.0000] awn: dspam_purge changes - -minor fixes to dspam_purge tool - -[20030901.0000] awn: configure changes - -- implemented checks (and use of results) for -- checking for math.h and fabs() were added, use -lm where need -- aesthetic changes - -[20030901.0000] awn: removed compiler warnings - -removed "no previous prototype" warnings with some compilers - -[20030901.0000] awn: compiler warnings - -miscellaneous changes to remove some compilation warnings - -Version 2.6.5-rc1.1 -------------------- - -[20030831.0000] jonz: debug output - -removed left over debug output - -Version 2.6.5-rc1 ------------------ - -[20030829.0000] jonz: fixed broken rfc attachments - -made compensation for broken rfcs with embedded attachments, where original -message should've been message/rfc822 but was instead attached as plain/text. -this caused attachments to be processed/consume large quantities of time. -decode.c modified to accept a new boundary definition from any header. - -[20030829.0000] jonz: --corpus flag foregoes message delivery/quarantine - -use of the --corpus flag will now prevent the messages fed in as corpus from -being delivered/quarantined - -[20030829.0000] jonz: added commandline delivery override - -commandline flags --deliver-cmd and --quarantine-cmd added to override the -default behavior for delivery (MLOCAL) and quarantine (either MLOCAL or -quarantine depending on configuration). syntax: - -dspam --deliver-cmd "/path/to/cmd -flags" -dspam --quarantine-cmd "/path/to/cmd -flags" - -(be sure not to use = sign). - -when overridden values used, the user id is by default NOT passed through to -the called program. use --with-passthru to pass ARG_USER %USER through to -the called program. example: - -dspam --deliver-cmd "/bin/cat" --with-passthru - -actually calls: /bin/cat -d [username] - -dspam --deliver-cmd "/bin/cat" - -actually calls: /bin/cat - -[20030829.0000] jonz: signature insertion moved inside body tag - -dspam signature now inserted (wherever possible) inside HTML body tags to -avoid droppage under certain conditions. - -[20030829.0000] jonz: changed dspam signature - -dspam signature changed to a visble signature to work with clients that -reformat only visible data (Eudora). new signature: - -!DSPAM:[SERIAL]! - -Version 2.6.5-beta-2 --------------------- - -[20030826.1800] jonz: added --enable-delivery-to-stdout option - -added --enable-delivery-to-stdout option which causes all delivered messages -to be printed to stdout rather than piped to an LDA. if you wish to have spams -printed to stdout as well, use the --enable-spam-delivery option in -conjunction. - -[20030825.0031] jonz: signature attachment mode - -coded signature-attachments mode, rewriting messages to include a dspam -signature attachment with full data, instead of writing the server-side -attachment. use --enable-signature-attachments to enable. - -[20030824.2345] jonz: application/dspam-signature media type - -added application/dspam-signature media type recognition - -Version 2.6.5-beta-1.1 ----------------------- - -[20030823.2010] jonz: fixed bug for empty headers - -fixed a bug where segments with empty headers would be dropped in reassembly -(currently these only seem to appear in mailer-daemon messages) - -Version 2.6.5-beta-1 --------------------- - -[20030823.1804] jonz: groups now share same signature file - -groups now share same signature file enabling them to use a single group alias -for forwarding spams. - -[20030823.1339] jonz: added new configure flags - ---enable-homedir-dotfiles -When enabled, instead of checking for $USERDIR/$USER[.nodspam|.dspam], -DSPAM will check for a .nodspam|.dspam file in the user's home directory. - ---enable-opt-in -Causes DSPAM to filter mail only for users with a .dspam dotfile. The default -is opt-out, which requires a .nodspam file to exist to bypass filtering. - -when using --enable-homedir-dotfiles, dspam installs as setuid root. - -[20030823.1100] jonz: fixed segfaulting on signature reversal - -[only affected alpha-4-internal] -fixed a bug where dspam segfaulted while reversing a signature making it -impossible to train dspam using signatures with alpha-4-internal. - -[20030823.1100] jonz: added support for message/rfc822 - -[only affected alpha-4-internal] -added support for parsing message/rfc822 components; signature was not being -found in forwarded messages using this media type. - -[20030822.0929] jonz: added fp alerts to cgi - -added customizable false positive alerts to cgi. alerts list will be -compared to message headers and hilight all messages that match in yellow. -alerts are stored as $USERDIR/$USER.alerts. - -[20030822.0929] jonz: fixed decoding header bug - -fixed a bug in the header decoding where the original encoding type was -reassembled into the message, instead of the decoded type. fix only -affected alpha-4 (internal). - -[20030822.0929] jonz: moved signature append to process - -moved appending of signature out of delivery_message and into the process -function, using the new message structures instead of parsing. this also -fixes a problem in that on memory failure, the delivery_message function -will no longer need to allocate memory. - -[20030822.0016] jonz: adjusted lock timeout - -adjusted lock timeout from 10 to 20 seconds. depending on the load of your -machine, this could be set higher or lower. the higher the setting, the less -chance of any failover deliveries being made, and the more chance of multiple -processes lined up waiting for a lock on a user's mailbox. - -[20030822.0014] jonz: documentation tweaks - -a few miscellaneous tweaks - -[20030821.2145] jonz: added --enable-spam-delivery - -added configure flag --enable-spam-delivery causing all spams to be delivered -instead of quarantined (for use with X-DSPAM header filtering - -[20030821.1935] jonz: rewrite of message post-processing - -Message post-processing rewritten; including appending of signature, -message re-write, etcetera. - -[20030821.1908] jonz: added header information - -X-DSPAM-Result: Spam || Innocent -X-DSPAM-Probability: (Actual Probability) - -[20030821.1820] jonz: removed CTX->copyback - -CTX->copyback is now obsolete. All base64 decoding is performed on -CTX->message, which is available from the context, or via calling -_ds_assemble_message() function using the message structure as a parameter. - -[20030821.1730] jonz: changes to DSPAM_CTX - -+ struct _ds_message *message; /* Message Components */ - -for compatibility with existing API, dspam_process still accepts a const char *, -however tools that already perform message actualization (such as the DSPAM -agent) can set CTX->message to the existing struct _ds_message * to avoid -reprocessing the message, and to carry over any encoding changes. - -[20030821.1730] jonz: implemented new decode/actualization functions in sig - -implemented use of new actualization and decoding functions [decode.c] in -dspam.c's signature scan code. - -[20030821.1729] jonz: finished block decoding functions - -/* Public decode function */ -char * _ds_decode_block(struct _ds_message_block *block); - -/* Private decoding functions */ -char * _ds_decode_base64(const char *body); -char * _ds_decode_quoted(const char *body); - -[20030820.0015] jonz: finished preliminary message actualization - -decode.c: finished preliminary actualization code (code responsible for -actualizing a message into its individual components). experiments with -plain messages and non-embedded multipart messages succeeded. next phase of -testing to include embedded multipart messages, including spams that are -designed to frequently break RFC. once testing/patching is complete, -decoding routines to follow. - -[20030819.0000] jonz: signature embeddedding changes - -signatures are now embedded in every text segment of a message to -insure they are forwarded properly - -[20030818.1350] awn: fix for empty messages - -(Submitted by Andrew W. Nosenk ) - -* added check for empty data to prevent segfault - -[20030817.1336] awn: configure script changes - -(Submitted by Andrew W. Nosenko ) - -* configure.ac: Work around versioning issues of some versions of - db-4. E.g. db_create() may be not a real function but simple - forwarding macro to the db_create_4001(). - -* configure.ac: New configure option `--with-db4-libraries' (as - pair for `--with-db4-includes') - -[20030817.1230] jonz: added --disable-bias configure flag - -when configure is run with --disable-bias, dspam no longer biases the -statistics in favor of innocent mail. This may increase the filter's -effectiveness in catching spam, but could also potentially result in less -false positive protection. some argue that eliminating bias is more -accurate, not less. - -[20030815.0300] jonz: added dspam_genaliases script - -a small script to create an aliases table from /etc/passwd - -[20030814.1928] jonz: added large-scale directory support to tools - -ported tools to support large-scape directory support (see below). - -[20030814.0005] jonz: added large-scale directory support - -when configure is run with --enable-large-scale, dspam stores all its user -files in large-scale mode. for example, user root's files would be stored in -/etc/mail/dspam/r/ro/root. directories are created automatically as needed. - -Version 2.6.4.1 ---------------- - -[20030816.2352] jonz: parse fix for boundaries with spaces - -added fix for multipart emails with spaces in the boundary definition -(e.g. boundary= "blah"). Discovered in some of the newer 'Urgent Response' -type spams. - -Version 2.6.4 -------------- - -[20030809.1115] jonz: corpus spams marked as misses - -spams learned through dspam_corpus are now marked as misses instead of -caught spam. - -[20030808.1945] jonz: changes to header processing - -Message-ID is now considered for useful information. Received header is now -considered, but parsed in a different manner preserving IP addresses and -other useful information. - -[20030808.1945] jonz: blank signatures will no longer get written - -blank signatures are a result of a failover passthrough for a particular -user. dpsam has been changed to not write a signature if the signature -itself is blank, preventing from appearing in an email. - -[20030808.1945] jonz: added .nodspam file functionality - -in an attempt to conserve disk space, a username.nodspam file may be -touched in the /etc/mail/dspam directory, which will cause all messages -for that user to be passed through dspam and not processed. this will -prevent a dictionary or signature file from being built and save disk -space. users wishing not to use dspam can still simply not use it, -but dropping a .nodspam file will prevent any files from being created. - -[20030805.1630] jonz: fixed multiple header destroy calls - -fixed bug where the header nodetree was destroyed a second time in some errors -that cleaned up and returned, causing a segmentation fault. - -[20030805.1400] jonz: added quoted-printable decoding - -added quoted-printable decoding; decodes hex codes into actual characters. - -[20030805.1230] jonz: documentation correction for dspam_corpus - -dspam_corpus uses --addspam flag, not -a anymore - -[20030805.1200] jonz: added verbose debugging option - -added --enable-verbose-debug for verbose debugging information to be written -to /tmp/dspam.debug - -[20030805.1200] jonz: new line unbreaking code - -new line unbreaking code to unbreak only quoted-printable lines - -Version 2.6.3 -------------- - -[20030801.0930] jonz: debug after context destruction - -fixed a bug in dspam.c that reported debug information for a context -after it had been destroyed. - -20030801.0930] jonz: dspam_clean to create new databases - -dspam_clean tool rewritten to create new databases when called in the same -fashion as dspam_purge. this helps keep the databases in good health and -smaller filesize. - -[20030801.0900] jonz: fix for PGP signatures - -fixed formatting bug causing PGP signatures to be corrupted. fix required -removing line unbreaking from message which could potentially cause dspam to -lose one or two signatures when messages are being forwarded from Microsoft -Outlook. does not appear to be a significant issue. - -[20030801.0900] jonz: fix for unchecked malloc calls - -fixed two unchecked malloc calls -=> struct nt *nt_create(int nodetype) -=> struct nt_node *nt_add(struct nt *nt, void *data) - -submitted by Thomas Lussing - -[20030731.0852] jonz: added syslog logging - -added syslog logging using mail facility - -[20030730.2323] jonz: documentation addition for username case - - added this to the README: - - NOTE: Some authentication mechanisms are case insensitive and will - authenticate the user regardless of the case they type it in. DSPAM, - on the other hand, is case sensitive and the case of the username used - will need to match the case on the system. If you suffer from this - authentication problem, and are certain all of your users' usernames are - in lowercase, you can add the following line of code to the CGI right - after the call to &ReadParse... - - $ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'}); - -[20030730.2311] jonz: fixed bug in dspam_stats - -fixed formatting bug in dspam_stats causing problem with usernames > 16 -characters. submitted by Stuart Gathman - -Version 2.6.2.03 ----------------- - -[20030729.2205] jonz: fixed more line parsing bugs - -fixed some additional bugs in line parsing which may have caused some emails -to appear blank in Microsoft Outlook - -Version 2.6.2.02 ----------------- - -[20030729.0225] jonz: internal cleanup - -removed unused variables and added prototypes for some functions lacking them - -[20030729.0225] jonz: implemented strsep to fix processing snag - -large messages resulted in significant processor consumption due to previous -method of splitting up messages line-by-line. strsep now implemented to remove -this bottleneck. - -Version 2.6.2.01 ----------------- - -[20030710.1000] jonz: fixed bug in dspam_stats - -dspam_stats now reports TS (total spams) as total spams minus spam misses. - -[20030710.1000] jonz: fixed bug in false positives - -fixed a bug where false positives reported without a signature would fail to -decrease the total number of spams. this event should never occur using -dspam, and only addresses this as an issue for any third party software using -the dspam library. - -[20030710.1000] jonz: added support for reusable contexts - -added support for reusable contexts, enabling a context to be processed -multiple times. - -[20030704.1827] jonz: fixed condition in chomp - -fixed a condition in chomp where it could potentially cause a segment fault if -called with a NULL pointer, or a string with zero length. this should never -occur anyway considering the calling code. - -Version 2.6.2 -------------- - -[20030701.0000] jonz: added DSF_CLASSIFY flag - -added DSF_CLASSIFY flag to libdspam. use of this flag causes libdspam _not_ to -record statistics for a specific operation, but only to evaluate and return -the operation's result. - -[20030701.0000] jonz: fixed bit assignment bug - -fixed a bit assignment bug resulting in clearing of all flags when headers -ignored -submitted by Stuard D. Gathman [stuart@bsmred.dmsi.com] - -[20030701.0000] jonz: fixed bugs related to corpus mail - -fixed a bug causing corpus mail's headers to be ignored -submitted by Stuard D. Gathman [stuart@bsmred.dmsi.com] - -Version 2.6.1.01 ----------------- - -[20030627.1924] jonz: fixed memory free of copyback buffer - -copyback buffer is now freed in dspam.c when context is destroyed - -Version 2.6.1.00 ----------------- - -[20030622.0000] jonz: added ` as delimiter - -[20030620.0000] jonz: added support for group dictionaries - -Group dictionaries enable a group of users with similar email behavior to -share the same dictionary while still maintaining a private quarantine box. -Please see README for more information. - -[20030620.0000] jonz: added dspam_stats tool - -The dspam_stats tool can be used to display the statistics for one or all -users on the system. Please see README for more information. - -Version 2.6.0.69 ----------------- - -[20030618.0000] jonz: line unbreaking correction - -correction made to line unbreaking to sanity check for consecutive -equal signs - -Version 2.6.0.68 ----------------- - -[20030612.0000] jonz: change to configure tool - -changed configure tool to look for db_strerror instead of -db_env_create in the event that libdb was built without -environmental functions - -Version 2.6.0.67 ----------------- - -[20030609.0021] jonz: bugfix in line unbreaking - -fixed a bug in line unbreaking (where clients use an equal sign -followed by a carriage return to break up long lines) causing -some attachments to be unreadable by some mail clients. lines -are now only unbroken in text segments. - -[20030607.1020] jonz: bugfix in attachment boundaries - -fixed a small bug that wrote the boundary twice at the end of -an attachment - -Version 2.6.0.66 ----------------- - -[20030603.1900] jonz: bugfix in line unbreaking - -fixed a bug in line unbreaking (where clients use an equal sign -followed by a carriage return to break up long lines) causing -unquoted signatures ending with an equal sign to be malparsed, -causing the email to become slightly jumbled. - -[20030603.1800] jonz: DSF_CORPUS flag - -added DSF_CORPUS flag for processing messages that are from corpus; -prevents innocent totals/hits from being subtracted when spam corpuses -are fed in. - -Version 2.6.0.65 ----------------- - -[20030601.0000] jonz: bugfix for locking - -a bug in the locking mechanism for tools fixed; occasionally could cause -a corrupt dictionary - -Version 2.6.0.64 ----------------- - -[20030525.2300] jonz: bugfix for boundaries - -fixed a bug causing boundaries ending in == to be parsed incorrectly -fixed a bug in parsing boundaries that used = without quotes - -[20030523.2300] jonz: bugfix for attachments - -fixed bug causing attachments to be dropped - -[20030523.2300] jonz: optimizations for large databases - -increased database cache to 4MB and implemented alternative btree -sorting routine to greatly speed up database functions - -[20030523.2000] jonz: addition of libtool/shared libs - -libtool is now implemented to build a shared libdspam library. - -[20030523.1830] jonz: bugfixes - -bugfix for multipart messages that caused message to be truncated -bugfixes to signature management causing some segfaults -bugfixes to crc64 calls, some calls returned a different crc every time - -[20030523.0100] jonz: partial rewrite - -Rewrote dspam engine into libdspam, enabling developers to link in libdspam -to provide "drop-in" spam filtering for their projects. - -Migrated to 64-bit tokens; previous 2.6-Beta databases using 32-bit tokens -will not work with this new version. - -Server-side-signature presently the only signature storage method; looking -into a different method of incorporating signature in emails. - -Implemented tracking of spam misses and false positives. Reported in CGI - -[20030521.2315] jonz: url tokens ignored outside of urls - -tokens found inside urls are ignored as individual tokens, and only -represented as Url*token. - -[20030520.0200] jonz: bugfix for base64 decoding - -fixed a bug that failed to decode non-multipart base64 messages - -[20030519.0000] jonz: ignore all html tags without spaces - -ignore all html tags without spaces; frequently used to separate tokens - -[20030519.0000] jonz: ignored collapsible html tags - -collapsed (rather than overwrote) html tags to join together tokens that -some spammers use such tags to separate. - -[20030518.1500] jonz: addition of dspam_crc tool - -dspam_crc tool converts a string into the numeric crc used for storage in -the dspam dictionary; makes it easier to use dspam_dump and grep for a -particular token - -[20030517.1930] jonz: bugfix for as_spam signature - -fixed a bug causing the signature not to be displayed -on messages marked as spams - -[20030517.1300] jonz: bugfixes - -fixed bugs in signature storage (delete .sig files to fix) -fixed bugs in dspam_purge -fixed bugs causing segfault under some circumstances - -[20030516.0052] jonz: exim documentation corrections by Jerome Alet - -Exim configuration to directors, not routers - -[20030516.0020] jonz: massive rewrite and optimizations - -addition of tbt and lht dynamic data structures -rewrite of debugging functions -rewrite of database functions -conversion to crc32 long integers for token management -addition of dspam_convert to convert old databases -renamed dbdump to dspam_dump, removed dbset/dbdelete - -these rewrites/optimizations convert all tokens to numeric (long) -values, making processing and sorting much faster. tbt implements -a binary tree sorting mechanism eliminating qsort. storing tokens -in numeric format also removes the necessity for the zlib compression -librayr. - -[20030514.1500] jonz: bugfix in content identification - -small bugfix in content identification that led some emails to miss a -dspam signature - -[20030514.1500] jonz: error message output added to debug - -error messages previously only made it to stderr. when --enable-debug -option is used, errors are also printed to debug - -Version 2.5.4 - May 14 2003 ---------------------------- - -[20030514.0240] jonz: added autoconf support contributed by Andrew W. Nosenko - -thanks to Andrew W. Nosenko for contributing the files/patches to provide -autoconf support to dspam. please read the README file for instructions. - -[20030514.0200] jonz: changed hash to support ints - -hash.c modified to support ints or character pointers. makes tracking -token frequency much faster. - -[20030513.2345] jonz: bug in dspam_clean corrected - -corrected a bug in dspam_clean causing it to fail - -[20030513.2300] jonz: experimental tokenized rules - -playing with a few experimental tokenized rules - -[20030513.2300] jonz: freebsd makefile setuid root - -modified the freebsd makefile to install as setuid root. this is due to -freebsd's mail.local requiring the ability to change its uid. dspam will -not work correctly on the commandline (for example when reporting false -positives) - -[20030513.0325] jonz: changed probabilities for single-corpus tokens - -probabilities of 0.0100 and 0.0101 were previously assigned to tokens -appearing only in the innocent corpus. this has been changed to -0.0099 and 0.0100 to balance out the 0.9900 and 0.9901 used for tokens -that appear only in the spam corpus. this very small change corrected -3 false positives that appeared. - -[20030513.0250] jonz: added documentation for exim - -documentation thanks to David Shirley - -[20030512.1930] jonz: applied changes submitted by Andrew W. Nosenko - -(DELIMITERS): Plain `^M' character is replaced by appropriate - escape sequence `\r' for avoiding gcc-3.2.2 warning "multi-line - string literals are deprecated" - -(MAX_FILENAME_LENGTH, MAX_USERNAME_LENGTH): Use system-defined - limits when available (for example max. filename length under - Linux is not 128 as harcoded, but 4096). - -(USERDIR): Define USERDIR only if not defined somewhere else - (e.g. from command line). Very convenient for building binary - package. - -Version 2.5.3 - May 12, 2003 ----------------------------- - -[20030512.1430] jonz: bugfix for ignored headers - -a bug was fixed that caused all headers to be ignored if a message was stored -as a raw message in the signature database. - -[20030512.1400] jonz: embedded boundary recognition - -added embedded boundary recognition to recognize emails with embedded bounaries, -such as those sent by Eudora when special formatting is enabled. - -[20030512.1200] jonz: documentation - -added better documentation for the correct permissions of the dspam -directories and the correct group memberships for the MTA user. - -[20030512.1200] jonz: locking bugfix - -fixed bug in locking that caused a loop if a lockfile could not be created -(due to file permissions). also increased lock debugging verbosity. - -[20030511.2025] jonz: false positives adjustment - -false positives reported now hit a token 3 times innocent instead of 2, -for faster re-learning. - -[20030511.2010] jonz: header parsing bug - -fixed a header parsing bug that did not carry the original header name -across multiple lines, for example the Received header. - -[20030511.1945] jonz: dspam_purge complete - -dspam_purge completed and expanded to delete old non-qualifying tokens -and defragment/shrink user dictionaries - -[20030511.1945] jonz: rewrite of dspam tools - -dspam tools rewritten to support new spam_record structure. - -[20030511.1945] jonz: implementation of struct spam_record - -new spam_record structure implemented for database storage; include last -hit date for new purge tool. subroutines backward compatible to work -with old databases. - -[20030511.1827] jonz: bugfix for lock sleep - -fixed a bug that caused all dspam processes to sleep for 1 second, even -if a lock was successfully acquired on the first try. - -[20030511.1719] jonz: addition of probability information to spams - -messages marked as spams now to include the tokens and probabilities used in -the message - -[20030511.1600] jonz: body tag filtering - -now ignoring body tags. the only frequently used tags that are being -considered are font, img, and meta - -Version 2.5.2 - May 11, 2003 ----------------------------- - -[20030510.1615] jonz: token word joins with punctuation - -token word joins modified to include dollar signs and exlamation points. for -example: - -$S A V E$ - -previously would result in 3 tokens: $S, AV, E$ but now results in one: $SAVE$ - -[20030510.1500] jonz: bugfix for multipart boundary - -a bug fixing a problem with multipart boundaries not being detected when defined -without using quotes has been corrected. this resulted in the dspam signature -(or identifier) never making it into the message. for example: - -Content-Type: multipart/alternative; - boundary='~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' - -is now detected correctly - -[20030510.0035] jonz: additional filtering - -added additional filtering to ignore words with control characters, -numbers that are not prefixed with $ or end with %, and any tokens that -do not begin with an alphanumeric character, with the exception of $ and #. - -[20030510.0020] jonz: bug fix for lock failures - -a bug has been fixed that caused dspam to loop, sending multiple emails -in the event of a lock failure - -[20030509.2100] jonz: Makefile for FreeBSD - -added makefile for freebsd - -[20030509.2015] jonz: procmail fix - -added small fix to accomodate some procmail implementations -that require an empty argument after -a - -[20030509.0130] jonz: addition of dspam_purge - -please see README for more details - -[20030509.0130] jonz: tools to output to stderr - -dspam tools to output to stderr - -[20030509.0130] jonz: removed probability from db storage - -removed the 13-character probability from the hash databases; was -taking up considerable space and wasn't necessary for the calculation. -is backwards compatible, so there is no need to delete any db's. - -[20030509.0040] jonz: ! is now treated as a delimeter - -the ! character has been added to the delimiter list - -[20030508.2330] jonz: added .lock locking mechanism - -added a .lock locking mechanism to prevent database corruption and/or -quarantine mailbox corruption. - -[20030508.1915] jonz: filtering of boundaries - -multipart boundaries are now filteres - -[20030508.1800] jonz: token word joins - -if a token is only one character long, and is adjacent to other similar -tokens, each token will be joined to create a single token. for example - -V I A G R A - -will be tokenized as "VIAGRA" - -[20030508.1800] jonz: header array abolished - -the array holding each header line has been replaced with a nodetree -(dynamic data storage) - -[20030508.0800] jonz: bugfix for dspam_clean - -dspam_clean segfaults after processing the first user signature file. this -was due to an invalid database handle being closed. the correct handle is -now used - -Version 2.5.1; May 8 2003: --------------------------- - -[20030508.0045] jonz: bugfix for inline comments - -inline comments normally used to break up guilty spam words such as -SEX - -were only partially filtered, leaving gaps between the letters and causing -DSPAM to miss the whole word. this has been corrected to eliminate the space -the comments previously used, bringing the words together for calculation. - -[20030508.0025] jonz: strdup() overusage - -if only one destination user is specified, strdup() is not used to duplicate -the original header/body pairs to pass to process_user() - -[20030507.1130] jonz: bugfix for multiple users - -when multiple users are specified in the local mailer parameters, the first -user process, due to a bug in setting ADD_AS_SPAM, determined whether the -message was spam for all other users. ADD_AS_SPAM is now reset to its original -value prior to each user's calculation. - -[20030507.2200] jonz: increased html filtering - -