Adventskalender Day 3 – Fun with SQL: Examining Comments

031

SQL is actually a nice language: We need, in most cases, only a limited vocabulary and the queries follow in her tight grammar the standard English language. However, many beginners shy away from using it, perhaps because here the risk looks bigger to wreck something.

Today we will look at some simple SQL queries that will give us more information about our comments. I recommend to use either a console or the SQL editor from a visual database management tool such as phpMyAdmin or Adminer. This is not a tutorial, more a starting point for your own research.

For the following queries I resort to a comment database that Ramona Hapke has kindly given me. There are 12223 entries.

How many entries are spam?

We look at the column comment_approved. Here WordPress stores the type of the comment. We collect the amount of the different values with COUNT(*) and put it into the variable amount.

Result

The 1 represents approved, real comments. We are already seeing: only 0.85% of all comments are spam. Unfortunately, this is a normal value. The mean part is: MySQL must always go through the entire table to find the real comments for the output in the published articles. With an index, an optimized query structure, this goes faster.

Which indexes are there?

Result

Well, so there is an index on comment_post_ID and one for status and date together. Nothing to worry about. However, I’ve seen that Ramona uses the plugin Subscribe to Double-Opt-In Comments. This adds three columns to the comment database, but no index. Let’s do that.

In MySQL we create an index like this:

We can also use a custom name:

And now we create three indexes for the additional columns so that the queries go a little faster. We don’t need custom names.

The index is now updated with every update of the database. This takes time and extra space, so we should not normally easily generate any index that comes to mind.

Which posts suffer the most spam?

The more popular a post is, the more spam it attracts. Some posts get almost never spam, others are downright overwhelmed. We get the 20 most spammed posts with this query:

Result

Spam per Post

Wow, what an incline! These 11062 comments include almost the entire spam. 92 Posts with comments are there, a total of 123. Now we could protect for these articles the comments with a registration or just close the comment form. Both solutions are not ideal.

Let us look not at the targets of the spam, but at its origins.

Where do most spam comments come from?

Result

Diagramm: Spam per IP

We can work with that. There is so much spam from some IP addresses that we can lock them out safely. This is works in a .htaccess, for example, like this:

However, this is still a bit cumbersome, so Christian will show you soon how to completely prevent comments sent from an IP address already seen as spam.

For today, I hope I showed you how interesting it can be to play around with SQL. Pure SELECT queries are harmless. In the worst case they ask for too much data, and PHP or MySQL take a break.

Post Sharing

Author Avatar

I am a developer and author for MarketPress. In my free time, I am a moderator for several WordPress communities.

Also Interesting

slack-blog

New Plugin: Slack Connector - Connect WordPress, WooCommerce and Slack

by Michael Firnkes

Initially we merely wanted to optimize our own Slack-processes. With automated notifications from our MarkettPress shop, the blog and our forums. The resul ...

Read more
vip

Inpsyde is the first WordPress.com VIP partner in Germany

by Michael Firnkes

Automattic has given us the title of WordPress.com VIP Service Partner, the first in Germany, Austria and Switzerland. Worldwide, there are only eleven com ...

Read more
wooexperts

Inpsyde is a WooExperts Gold Partner of WooCommerce

by Michael Firnkes

A few weeks ago, Inpsyde GmbH with MarketPress became a certified WooExperts Gold Partner. This is confirmation from WooThemes of our agency’s know-h ...

Read more
backwpup

BackWPup Pro: Secure WordPress Backup with Google Drive

by Michael Firnkes

The Pro version of our BackWPup plugin supports the backup of WordPress databases and files to Google Drive. But how do you set something like that up? And ...

Read more

Comments

2 Comments

  1. #2

    Your work inspired me to do a graphic version of spam analysis:
    http://www.antoniorinaldi.it/launch-of-spam-analytics/

One pingback

  1. launch of spam analytics | sei-uno-zero-nove

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">