User Comments
Looks good so far. By the way, the comments page no longer validates as xhtml since you added the cpatcha. You might want to look into that. |
Hey it works well. That’s pretty damn cool mate. Good work. This will be really handy on my guestbook program —if I ever finish it. |
Thisis good but ImageMagick is tricky to install. But if its already installed then your ideas is very clever. I have seen thuis on other websites too I think it is on Yahoo. |
Marie: Glad you like it, I’m delighted that you have found a use for it. Thanks for bringing that validation issue to my attention, I’ll sort that out in the next day or two. ==awesum==: I had never had any problems installing ImageMagick but I am aware that some people find it troublesome. I don’t install it directly though. I use Perl’s CPAN (http://www.cpan.org/) to install the PerlMagick package and this package installs ImageMagick itself. So, if you need ImageMagick but are having trouble installing it, use CPAN from the *nix shell: perl -MCPAN -e shell This will fetch all the relevant files, “make”, “make test” and “install” everything you need. |
Hey, great article - you really covered all the bases! I kept thinking about the accessibility issue for blind users, and right there at the end you’ve got it addressed. Great job. I don’t know if it’s possible given the software you’re using, but you might want to try playing around with streaming the image directly to the browser instead of temporarily saving it to the filesystem. I’m doing that in the charting portion of my company’s software using the jCharts charting package, which nicely provides a method for doing so. |
Jennifer, you’re right - streaming the image is exactly what I need to do. That would make the system more secure and would eliminate one of the clean-up tasks. This was my original plan and I spent hours and hours trying to “make it so”, but I was ultimately unsuccessful and so I chose to write the image to the file-system and access it from there. If anyone can offer me any information on how to stream an image I would be eternally grateful… |
Nicely done. Don’t know if you want a hint for your perl code though - no need to read in the entire dictionary just to pick a random word - see ‘perldoc -q “random line”’. |
As a developer I understand the comment spam problem and everything that goes with it. Dispite this I still find your attitude to disabled people dissapointing. “Thus I have an accessible channel to the comments system for visually-impaired users. All I need to do is add a few guidance notes to the CAPTCHA (or to an accessibility statement) to that effect and that should ensure that accessibility isn’t compromised.” The very reason people suffer from disablement is because things are made harder for them to do. Yes, it is possible for a dissabled person to post comments - but you are actively dissabling them further by making them jump through unnessesary hoops. In either case you are also cutting out anyone who cannot or choses not to display images. I do not beleave that any visual CAPTCHA system is satisfactory. Any system must be fully useable from Lynx in my opinion. |
Noah you are absolutely right. I am not claiming that my system is perfect and I acknowledge that some users will be terribly inconvenienced by the Captcha. I wrote in Defending Against Comment Spam, “I have never suffered [comment] spam via the Urban Mainframe.” Why then, with no spam problem, did I implement the Captcha? There were two reasons:
In practical terms, yours is the first serious criticism I have received of the system. Now, to be perfectly honest, I don’t know how significant that is. I don’t know if I have any disabled readers. I don’t know if the Captcha is preventing non-disabled readers from commenting. I have no metrics, no empirical data. I welcome feedback on all aspects of this website: functionality, design, implementation, UI, architecture, navigation, content, etc. Without that feedback, I cannot make any sound decisions as to what works and what doesn’t. From a functional perspective, I designed the Captcha to be switchable from the start. Thus, if I receive enough feedback indicating that it is a problem, I can simply switch it off.
I don’t like this any more than you do Noah. But this is a difficult call for me - I have little enough time to invest in the Urban Mainframe as it is, without having to moderate comments. I wish there were a perfect, secure, accessible comments handling mechanism - but I haven’t found one yet. What do you suggest? |
Thank you for taking my comments seriously. The problem of comment spam and counter measures has been plaguing me recently and I simply can’t stop thinking about it. There are many, many methods people have come up with, including visual CAPTCHAs through to a whole registration process. These all have their drawbacks and I have been spending a lot of time thinking of a solution that would enable a casual surfer to easily post a comment without any inconvenience and yet stop robots in their tracks. Another requirement in my thought experiment was that the system was fully usable under the Lynx browser. In addition to all of this I realised that although a lot of my solutions would work as a standalone solutions one of of my own personal blogs, I needed to think of something which would scale up so to speak. Something that would work as a MT plugin or where a vast majority of the blogging community adopted the method. Please note that this would also have to work perfectly well as just one - many methods I have seen proposed would require the entire webloging community to participate, which is effect renders them useless. Then today, on a boring train journey home, it hit me what the solution was! I realised what the actual problem with implementing this system was. A guy named Alan Turing wrote a lot about Artificial Intelligence (A.I.) an devised a method to test such machine intelligence. He called his test the Turing Test. If you are not familiar with the test I suggest you do some googling as it is too large a topic to go into here. Basically, he stated that machines were intelligent when they could trick a human - via some sort of remote computer connection - into thinking that they were also human. Now what lies at the heart of the comment spam solution problem is also in essence a variation of the Turing Test. We are challenging a human to prove their “human-ness” so to speak. That in it’s self is not a hard experiment. Imagine your self in a chat room talking to other people. After a few minutes talking to anyone of the other “people” in the chat room I am positive you could determine which, if any, were just cleverly programmed “bots”. However, the problem in this case is: Getting a computer to judge a Turing Test carried out on a human. This is in essence the reverse of the problem Turing devised. While thinking about this it occurred to me that as a culture our programming knowledge is not yet sufficient to build a computer that will reliably pass the Turing Test - so how could we expect to program that could conduct one! So what we need is a way for a site’s author to personally conduct his own Turing Test on all people who want to comment… Stupid, ludicrous, impossible I hear you say! Well perhaps not… Consider this: Every time the site’s author posts a new comment to the site the content management system, or blogging software asks the author to specify a question. The question would be a very simple question that perhaps the youngest of humans could answer. Some examples include:
The system then asks the author for a set of accepted answers such as:
Now, once this simple process has been completed the comment is posted to the site. When a user tries to comment they are challenged, on the same page, with a simple and unique question to answer. Providing the question is simple enough the user should have no trouble providing a typed text answer. The system would then check the answer against the list of accepted answers accounting for spelling mistakes etc. If the user did not pass the test the comment is simply held back for moderation. The site’s author can then view all comments awaiting moderation. If the comment is spam, the author could then simply click a button to blacklist the URLs that are contained within the spam. This system would not work if there was a long list of preset questions because spammers would get hold of the source code and configure and adapt their bots accordingly. Also, on top of this, a system could be implemented before a user is even allowed to comment. Firstly, make sure the referrer of the comment’s page is what it should be. Yes, I know this can be hacked, but it’s worthwhile anyway. Secondly, make sure the user is not trying to post a comment only seconds after initially loading the post. Any normal user would wait at least a few seconds between requests, a spam bot may not. Thirdly, make sure that no one can post a duplicate comment within an hour of another one. If there are successive attempts keep pushing back the time limit until a duplicate can be posted. So there you have it… it’s only a rough outline of my ideas. Let me know what you think. |
Is there any other way?
I agree. Solutions that rely on registration on a master-server don’t really appeal to me either. If my website is being targetted by the spammers then I would prefer to address the problem locally.
I’m familiar with it. Have you met Gina? :-)
Which is exactly what we are doing with the Captcha of course. However, I do appreciate that my implementation is a visual one and that visitors with vision impairments are likely to be unable to proceed past the device.
While it’s a great idea Noah, it does introduce problems of its own:
It’s amazing how sometimes the simplest things can have the biggest impact. Noah, you’ve hit on something right here. I could retain my Captcha system if I were to make it non-compulsory. Consider the following process:
I think this workflow enjoys the benefit of the Captcha without impacting usability/accessibility. What do you think?
I concur. Indeed, any web-form processor should check for a valid referrer as a first line of defence.
Tricky to achieve neatly on an stateless web!
This is in fact a feature of the comment handler on this website. Thanks for your interesting ideas and suggestions Noah. Believe me, I am taking them seriously. You will probably see a few changes to the comments handler here within the next few weeks. |
I keep getting: Database Error I have encountered the following error while performing database operations. Consult with your System Administrator or ISP. Return to the previous page, or use the “Back” button. |
This is a test: ` When these characters are in a post they are causing the comments handler to barf - and these characters will often appear since I use SmartyPants to generate them in posts that are copied-and-pasted here! Hopefully this is fixed now… |
Yes… :)
heh :)
Although not ideal, I do agree that this would be better than at present.
PHP sessions would solve this, though I am not qualified to know if this is possible in other server-side languages. |
I don’t publish any content (at the time of writing) in any language other than English. However, the CMS I use is a commercial product, used by customers who do publish in multiple languages. Any system I implement has to be able to me used in that environment.
That is major-league cool stuff! Thanks for pointing this out Noah. I’m going to have to investigate further since I could really make use of the “soundex” function.
I did misunderstand. I thought you were talking about a site-wide Q/A system rather than one that was post-specific.
That’s true. But my “human test” (Captcha) is automonous, whereas a Q/A system is not. That is what I meant.
Me too. I will investigate further…
I don’t use PHP. Nor do many other sites. My backend application is written in Perl and C. Whilst sessions are possible with this combination (and relatively easy to implement) the stateful session, to my knowledge, is not. However, that probably wouldn’t matter if the other defences are properly implemented. |
From a background of CMS development I can tell you that if a CMS can handle posts in multiple languages - it could handle questions/answers in multiple languages. Depending on the licence of the CMS either a hack/module needs developing or a request to the publishers. I don’t know much about MT as I’ve never used it, but I have heard a lot about the vast array of pluggins you can get. Surely this idea could be implemented in MT via a pluggin?
That’s just the surface! :) Check out the Metaphone algorithm, developed by Lawrence Philips. This algorithm is available in PHP as “string metaphone ( string str)” Also, you might want to look at the levenshtein algorithm. This one is great. It calculates the “Levenshtein-Distance” between two strings.
Hehe… I had to pick you up on this one! While the use of Perl (which is rather long in the tooth), a la MT, is in widespread use amongst bloggers you will find that PHP is in use by over 16,251,453 domains and with a 52.65% market share last year, it’s hard to question the dominance of our favorite language. :) See: http://www.php.net/usage.php http://www.phpfreaks.com/articles/172/0.php http://www.sitepoint.com/blog-post-view.php?id=170246
http://www.w3j.com/6/s3.stein.html There doesnt apear to be much info on statefull sessions in Perl but I am sure it is possible WITHOUT COOKIES. There is no point using cookies as spam bots will just ignore them. instead generate a unique id for each post request. Put this id in a hidden form element. When the user or spam bot submits the form you can reference the request time against the time the unique id was set - simple. |
I know. I don’t know what I was thinking of. I blame the lack of sleep! :-)
I’m sure it could, if it hasn’t already. I’m only familiar with the Markdown and SmartyPants plug-ins. I’ve never deployed MT, so I have no knowledge of what else is out there.
Wow, this is great stuff. Thanks for these pointers Noah. I’m not going to sleep for weeks now! ;-)
There’s no way I’m going to get involved in a discussion about languages. In my opinion, it’s a toolkit and one simply chooses whichever tool is required for the job in hand. I like Perl, it’s that simple.
I’m not sure it is. Obviously sessions are possible with Perl, but maintaining a stateful connection to a web-server (or the illusion of the same) is not something that I’ve ever come across in the Perl world. To be fair, I’ve never had this as a requirement anyway. |
Glad to know I helped! :)
Oooh… My bad. Now that was a flame war just waiting to happen! heh :) Yeah your absolutely right though, just different tool kits for the same job. Oh! about sessions: You wouldn’t need a session if you simple stored a tempory id in a table against a time stamp. When the user submits the comment form, with the id hidden in a form element, the script compares timestamps. If the form does not have an id, or the id is incorrect (i.e. non-existant) then the comment is treated as failed and held back for moderation. The only time this would ever happen is if the form had been modified by the end user… i.e. spam bots. |
Skillfully defused by yours truly! :-D
True. I use a similar mechanism for tracking the Captchas themselves. The only real drawback here is that there is some clean-up overhead. I appreciate that this is a small price to pay for the very obvious benefits offered. |
Thank you for posting the dictionary file you used. The only problem we found was that four character or less words include “sex,” “shit,” etc. This becomes an issue (that can easily be fixed) in family friendly uses. -Nick Clark |
Thanks for advising me of this problem Nick. I’ll scan through the dictionary when I get a minute to make it more “family-friendly”. |
I just read on Wiki that there is a way to circumvent and defeat captcha’s by fooling humans into doing the reverse Turing work for you on a different website, disguised as something else, pulling your image and presenting it to the user there. With the accessibility issues already widely reported, I really don’t know whether captchas are the right panacea for the problem. |
You have a point - but that method is mainly used to circumvent CAPTCHAs used by the likes of Yahoo and Hotmail so the spammers can get thousands of free email accounts. When it comes to blogs, I doubt any such system (which must be fairly complex) would ever be used. The only reason anyone would want to get past a CAPTCHA on a blog would be to crop-dust thousands upon thousands of blogs with spam. For this reason, when the spammer hit a few CAPTCHAs here and there I cannot imagine them having the motivation to pursue. It would simply be uneconomical for them. Putting aside the accessibility issues, blog comment forms seem quite a reasonable place for CAPTCHAs… until a generally more elegant solution is found. |
Personally I hate having the Captcha on my comments handler. I believe it’s a real barrier to some users. I also know that some Urban Mainframe readers won’t post comments here, because they don’t approve of the Captcha system. Captcha’s aren’t perfect, I agree. But what is? I am prepared to accept the inconveniences of the Captcha because I simply don’t want to have to handle comment spam. Since I introduced the system, I have had only 2 spam comments - something of a record for a weblog. I have alternatives available. I could switch the Captcha off today if any of the alternatives were any better:
I hate having to justify myself like this. However, this is my weblog. I have to maintain it. I don’t have time to deal with comment spam, so the Captcha system is going to remain in place, at least until I find a better alternative. |
I wish a Captcha type system were not necessary. Unfortunately it is. Recently I was hit by several thousand comment spam postings from the same outfit. I did a little research and find over 100,000 blogs, forums, and other comment type pages clogged with their garbage. I have many websites for which I’m responsible. On those sites for which the comments were not necessary, I used option 3. “* No comments - I could disable comments completely.” On those sites for which the comments were a necessary part of the website, I monitor the comments by deleting errant postings after the fact. This worked until recently. On one site that was prone to abuse, I monitored the comments pre-postings. This has the unsatisfying delay you mentioned. I tried restricting to registered users. Unfortunately, the new breed of content spammers have register bots (or humans) that pave the way for spamming by pre-registering. I believe the tactic is to massively spam then run, hoping that the job of deleting the spam comments are so onerous that the comments will be left. Judging by 100,000 pages surviving long enough to be googled I suspect this strategy works for them. So that leaves me with either turning off all my forums or using a Captcha system. Which brings me to your page. Thanks for posting the perl you are using. This cuts some development time from my end. Much appreciate it. |
Great - some more code. Cheers Jon!
Now I’m going to log out and try the captcha (hope I spelt that correctly).