Skip to main content

Internet Epidemiology


In 1854 London Dr. John Snow, by mapping incidence of cholera, was able to make the connection to that city’s water system.

The recent New York Times Article “Research Tracks Down a Plague of Fake Web Pages” considers research by Microsoft and the University of California, Davis tracking down some of the roots of spam on the World Wide Web and Internet. Conclusions included:

The two top non-commercial TLD spam sources are .edu and .gov

Additional TLD spam sources are as follows:

Registry Percentage of spam
.com 4%
.org 11%
.net 12%
.biz 53%
.info 68%

Additional results of the paper included:

That for doorway domains, that the free blog-hosting site had an-order-of-magnitude higher spam appearances in top search results than other hosting domains in both benchmarks, and was responsible for about one in every four spam appearances (22% and 29% in the two benchmarks respectively, to be exact).

That over 60% of unique .info URLs in our search results were spam, which was an-order-of-magnitude higher than the spam percentage number for .com URLs.

That the domain was behind over 1,000 spam appearances in both benchmarks, and the ~ IP block where it resided hosted multiple major redirection domains that collectively were responsible for 22-25% of all spam appearances.

That for aggregators two IP blocks and appeared to be responsible for funneling an overwhelmingly large percentage of spam-ads clickthrough traffic.

That for advertisers even well-known website ads had significance presence on spam pages.


    Domain Name System
    Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as"""" is not an IDN."