In my work differentiating ham and spam (for lack of accurate terms with few syllables) since 2001, I came to the conclusion that it is useful to adjust detection depending on the special characteristics of the sender's industry.

This work is really focused on ham senders. This is about ways to adjust ham/spam decision algorithms so that ham is not missed, based on special characteristics of certain kinds of ham senders. I suggest that being able to evaluate mail based on the type of sender improves accuracy both in the false negative and false positive area.

Nice dream you say? Luckily I am here to dream and brainstorm and make impossible things happen, so let's explore this all the way.

I've already incorporated crude classification of sender type into the Outbound Index databases. But it needs to be:

 - Refined. My first thoughts about it are unlikely to be the perfect ones

 - Commented on, improved by others input, collaborated about.

 - The needs of many anti-spam systems / companies / software could possibly be represented in a single useful email industry classification code system

 - If it were cleverly designed - the system could live with classification changes and additions without breaking the systems of people using it with each change

 - Study existing industry classification systems and find their good and bad points and figure out which ones could apply well to email reputation systems.

I've heard some people say about email senders "It's impossible to list them all" - and that's true. I suggest it isn't necessary to list them all. What I've observed is that I simply need ways to treat a relatively short list of prolific senders different from each other, and being able to do so is very useful to accuracy.

If a sender needs to be added to the list, add them - and usefulness increases. But there is no "It's pointless until every sender is listed" situation.

I would start with the whole Internet in one group. Now ask which groups have really distinct characteristics for email. I will share with you some of my certainly imperfect thinking:

When I say "senders" I really mean outbound mail servers. Yes, many have a mixture of even the types I define below - so for now that's all we can say about them. As time goes on, they may wish to dedicate servers to purposes or not, and will be judged by that choice if they are mixing types of mail that cause problems for each other. I suggest that if the industry creates meaningful logical classifications, it is more likely that servers with dedicated purposes will come into existence.

In question - are we classifying the *policies* of senders (who then will measure discrepancies between stated policies and actions), or the *affiliations* of senders (government / military / does non-profit or for-profit matter), or *who they serve* ie providers of internet access to consumers, or interent access to businesses, or are they merchants, or financial services, or ?

1. Sources sending for users who can get an account anonymously and lack outbound filtering / rate limiting appropriate to stop abuse.

2. Private corporate / enterprise sources dedicated to one company or set of related / owned companies. These are not providers of internet access to consumers or resellers of bandwidth or connectivity to other businesses.

3. Providers of internet access to consumers

4. Providers of internet access to businesses

5. Providers of wireless hotspot access, services to travellers such as hotel systems

6. Providers of colo

7. Government

Classification may be the work of Reputation Collectors, or of Reputation Aggregators, or perhaps ideally an open system that can handle a "source" record. Source is who put this record in, who thinks the classification is X. You can then choose which source(s) you will use/trust. You could selectively drop (ignore) sources you find you don't want to trust. You could set up your own interpretation / voting:

 - Source "ReSpam" says this sender is a school

 - Source "Montgomery Report" says this sender allows anonymous webmail signups and has no outbound rate limiting.

I am not suggestion that this classification system become a reputation system unto itself, with evidence files or an unlimited number of data points. Rather that it stick to a narrowly defined area of "classifications useful in differentiating characteristics of email sources." Other reputation collectors could reference the ID or classification numbers in this system if they wished to, and attach other data points within their own databases.

Examples of characteristics: Some envelope-from domains may reasonably be expected to originate from a wide range of sending servers around the world. Others will come only from a single place.

Conversely, some outbound servers would be expected to send out mail from a huge variety of domains - say a web host of vanity or small business domains which share their ISPs outbound servers. Others would be expected to have a very narrow group of envelope-from domains.

We (in the Outbound Index reputation Aggregator) characterize name servers as part of the decision algorithm. It's normal for ISPs with certain policies or offerings to have a high turnover in name server host names and IPs. On the other hand, most hosting companies run a small fixed number of name server hosts which rarely change names or IP address.

As an exaggerated example, it would be uncharacteristic for a bank to have 500 name server hosts within a class C network. We have seen at least one host who I believe has over 30,000 name server host names within a class C network.

How would the "senders" resources be identified? By nethandle, CIDR block, verifiable rDNS patterns, your idea here?

Summary: This is half-baked and needs input from others, particularly in definition of structure such as "classify by type of service, suffixes inidicate policies" or something better I haven't thought of. I expect to re-organize this post with appropriate headings.

Comments invited.