Java SE 6: I don’t need any beans with my Java Tarpitted by gmail || How large providers can get you by the balls
Dec 13

Logic based CAPTCHA to beat the blog spam bots

Tech Add comments

I have been fighting spam in email and on community sites forever. Man what an arms race.

When it makes sense to only allow community members to be able to post, life gets simpler, but in many cases you do not want to narrow the field. If I end up on some random blog and they need me to login? See-ya.

I have tried a lot of stock plugins for handling spam, and although a lot of good work has gone into the like of Spam Karma, WP-Hashcash, and the many others, I always have had problems.

I chose an image CAPTCHA on this blog a long time ago, and spammers now can do OCR and get right through it. I am moving to a new blog shortly, so will fix that issue at that time.

The problem with image CAPTCHA (other than the accessibility issues) is that the arms race means that already you either get beaten by smart spammers, or the image is so hard to read that HUMANS can’t read it. I have personally been baffled a number of times as I type in what SURELY is the right mix of numbes and letters, but the system tells me that I am wrong.

I recently had an attack of spam at soundmoneytips.com (a great little site by the way) and it was the last straw.

The only solution that really made sense was to get out of the herd mentality, and go it alone.

That is why I choose a logic-CAPTCHA that asks a brain dead simple question that a human finds ‘duh’ but ideally is hard for a computer to grok.

The simple math based questions (4 + 14) are destined to be beaten by spammers as soon as there is enough of them and critical mass means that the spammers need to write the simple bot that can eval: 4 + 14.

The plugins that only work if JavaScript is understood also didn’t work for me.
The damn spam bots were smart enough and go through the system. I guess it isn’t that hard to embed a JavaScript interpreter, but sheesh!

Anyway, back to logic CAPTCHA. The beauty is that you get to write your own questions that you ask people to answer.

You can ever ask things that only your audience would know. For example, on Ajaxian.com we ask questions such as: “What does the X in Ajax stand for?” (even though Ajax isn’t an acronym).

When you get personal like this, you are out of any critical mass. Chances are unless you are a huge company, the spammers will not think it worthwhile to beat your little set of questions (which you can change too of course).

It is a pain for people to have to answer the question, but at least we keep the spam bots at bay.

For now.

7 Responses to “Logic based CAPTCHA to beat the blog spam bots”

  1. Jeff Schiller Says:

    I did that on my own blog very recently. It simply asks a breezy math question (like 4+3). Spambots doing OCR would either think the answer is “4+3″ or “4t3″. Either way, I win and they lose…

  2. toyota Says:

    The Regard! Long ago

  3. Breitling replica watches Says:

    I have seen techniques to prevent spam even nicer. There is one (I can’t remember where) that besides a captcha it also count how many letters you actually typed, how many copy/paste-s you did, if there are href’s and so on. Very complex and highly effective.

  4. Jonathan Says:

    Where can I get this? I have looked and I am not seeing it anywhere?

  5. ali Says:

    hi,

    i have written a script that produces a obscure mathematical equation to be solved by the user for captha. the equaton is obscured by. 1)having words rather than digits. 2) surrounding words by special characters 3) randomly changin these characters. examples.

    1) s^{Fourty:Five|^s +Plus+ s^|Two}^s =
    2) +Plus+ =
    3) *?Fifteen|* +Plus+ *|Seven?* =
    4) dd#Twenty~Three#bb +Plus+ dd#Two#bb =
    5)+Plus+ =

    as a experienced spam victim – could you possibly tell me how easy or difficult would my captha be to break. plz see example above, it changes on attempt/submit including the two words that comprise numbers of 21 and above.

    thanks.

  6. Sinvex Says:

    Problem with logic based captcha is that you cannot have the computer generate an infinite amount of different ones. This leads to you having to constantly do more work to keep new ones up there as someone could easily just sit around and record a database of them and then sell that to spammers or use themselves. This may sound like some tedious task nobody would want to do but you can easily contract such a thing out to a random person similar to the “Chinese gold farmer” industry in MMO’s (there are tons of people SOMEWHERE in the world who are willing to do all this stuff for next to nothing.)

  7. Andrew Says:

    Math captcha (logically) is only one of various types. What is captcha will as some short video clip where will need to answer to some question? Or where will need to solve easy puzzle? It will make your page more interactive and interesting :)
    You can read related article here

Leave a Reply

Spam is a pain, I am sorry to have to do this to you, but can you answer the question below?

Q: What are the first four letters in the word British?