Skip to main content

Stats on October’s deleted comments

In the interests of openness, we have compiled statistics for the posts we have deleted from the blog last month, October 2007, broken down according to person and type of comment. Apologies for the late posting. It was only noticed that October’s post was missing when the stats for November were being compiled. We have only listed those who had three or more comments deleted. There were six with two comments deleted, and the remainder with just one comment deleted.

The figures do not perfectly represent what appears on the blog as comments that are made to a deleted comment also do not appear on the site, so while the responding comment may have been perfectly reasonable, it will not appear.

In total the site received 1,599 comments in October, 1,409 were automatically removed by spam software, and we have deleted 40 of the remainder, so just over 21 percent of them. This is much higher than we are happy with, and was a combination of an ongoing wave of spam and persistent irrelevant posts from one individual who has appeared frequently on these deleted comments summaries. We are still improving our filters for spam comments and trackbacks, which appear to be working as we have recently noticed a move away from automated spam to human comments – where people post seemingly relevant comments in order to add in a link to their particular website.

The stats are given in tables below:

Deleted comments on blog — October 2007
Type of comment

Above are the deleted comments broken down according to type. These types are outlined in our Comment Policy. We may add or remove types over time as the situation demands.


    Domain Name System
    Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as"""" is not an IDN."