This is a follow-up to Doctor Q's "Top 50 Posters" thread, but I like specific thread titles so I created a new thread... (hopefully this time I won't time out and lose this post - I have "Remember Me" checked, crossing fingers!)
In that thread, Doctor Q wrote: "Now if I could just measure post quality, that would be really be something to study!"
I hope to show that yes, there are some ways of doing this, but not necessarily for individual posts, but for the population of posts that a poster has posted. That is, to understand if a poster is writing (on average) quality posts, you can't just look at a few and make that determination - you need to look at the universe of posts that person has made - or a predetermined subset.
Now, before I start, a few background items:
* I did take a look at vBulletin documentation, but after a while I gave up, so much is possible even without customization, but I may be suggesting things that are difficult or CPU-intensive to do. Likely the key here would be post-processing.
* I didn't consider whether or not people who post often are valuable to the site or not. e.g., it could be that having people post a lot - even if the posts are not good ones - is still somehow good for Arn/MacRumors. My personal judgement is that people that just post for the sake of posting are not helpful - but I don't sit in Arn's chair either.
OK, let's get to a few ideas.
* Post length/word count per post:
There are two sides to the coin, the "Yeah!" contentless posts, and also the TL;DR variety (e.g., this post). If someone consistently makes very short posts, then that's a warning signal. While I don't see a lot of TL;DR posts, if someone were to be doing that on a consistent basis then that would likely be an area of concern as well.
Adjustments would have to be made for people quoting others and for code snippets. I'm also unsure about how signatures might need to be included or discarded.
* Spelling/grammar:
This is much harder to do - not everyone here is a native English speaker, people make innocent typos, and code is simply going to throw a spellchecker/grammar checker for a loop. Nor would you want to punish someone for quoting someone who made errors. But technically this can be done, perhaps getting to a misspelling rate by word count.
* Hot words usage:
Instead of a full spell check, one could look for words that are intended to inflame or incite: fandroid, fanboy, fanboi, sheep, sheeple and so on. These terms wouldn't necessarily trip the censoring filter, but it would catch people that consistently use these terms. Again you would want to get to a rate of usage, not an absolute number. What would you think of a poster who uses the word "sheeple" in 97% of his 432 posts?
* Use of quotes in a post:
I actually have no idea of what this might show; I see many considerate posters using quotes very nicely. On the other hand, I also see quite a few arguments using quotes. But what I don't like is when someone posts something like "That is SO true" in a 10-page thread with no quote - with no context, it's useless (it was probably useless in any case, this is just for illustrative purposes). So perhaps this would be a rate of non-usage of quotes - but I really don't have a prediction here, other than I think it could be useful. Funny thing about analytics, things you think would be useful wind up on the floor quite a bit, and things that you'd think are dumb wind up being powerful... until you actually take a look, you don't know.
* Up-votes per post rate:
So, here we go. While I'm sure people have thought of this before, the devil is in the details. First off, it's probably best to view an up-vote as a sign of popularity, not quality. I've seen (back in the day when down votes were available) high-quality, accurate and timely posts be down voted massively, and inaccurate contentless posts get voted up. But overall I'd wager that posts with higher upvote counts tend to be the better posts.
Now, there are challenges here besides what was already said: older posts still have negative vote totals, while more recent posts can't - so you couldn't mix the populations of posts before Arn's "no down voting" decision was made. Also, posts with zero upvotes can be really horrid posts now, but they'd be regarded as the same as neutral or average posts - meaning a post that was certainly OK but just wasn't interesting or insightful enough to vote up.
Even given that, there's still good data there - but the best measure would be the rate of upvoting per post views. That is, if someone posts something popular in the very beginning of a widely-viewed thread, they're going to get a lot of up votes simply because there are a lot of people viewing that thread. Contrast that to a thread that only had 100 unique readers, yet a post there gets 8 up votes.
I have more, but this is probably overload for most folks. Oh yeah, I'd toss in length of membership into a lot of these as well.
Mind exercise: what would happen to posting behavior if some user titles were replaced with "Blowhard," "Village Idiot," "Know it all," "Self-Proclaimed Legend," "Blah Blah Blah" and the like?
<ducks and runs>
In that thread, Doctor Q wrote: "Now if I could just measure post quality, that would be really be something to study!"
I hope to show that yes, there are some ways of doing this, but not necessarily for individual posts, but for the population of posts that a poster has posted. That is, to understand if a poster is writing (on average) quality posts, you can't just look at a few and make that determination - you need to look at the universe of posts that person has made - or a predetermined subset.
Now, before I start, a few background items:
* I did take a look at vBulletin documentation, but after a while I gave up, so much is possible even without customization, but I may be suggesting things that are difficult or CPU-intensive to do. Likely the key here would be post-processing.
* I didn't consider whether or not people who post often are valuable to the site or not. e.g., it could be that having people post a lot - even if the posts are not good ones - is still somehow good for Arn/MacRumors. My personal judgement is that people that just post for the sake of posting are not helpful - but I don't sit in Arn's chair either.
OK, let's get to a few ideas.
* Post length/word count per post:
There are two sides to the coin, the "Yeah!" contentless posts, and also the TL;DR variety (e.g., this post). If someone consistently makes very short posts, then that's a warning signal. While I don't see a lot of TL;DR posts, if someone were to be doing that on a consistent basis then that would likely be an area of concern as well.
Adjustments would have to be made for people quoting others and for code snippets. I'm also unsure about how signatures might need to be included or discarded.
* Spelling/grammar:
This is much harder to do - not everyone here is a native English speaker, people make innocent typos, and code is simply going to throw a spellchecker/grammar checker for a loop. Nor would you want to punish someone for quoting someone who made errors. But technically this can be done, perhaps getting to a misspelling rate by word count.
* Hot words usage:
Instead of a full spell check, one could look for words that are intended to inflame or incite: fandroid, fanboy, fanboi, sheep, sheeple and so on. These terms wouldn't necessarily trip the censoring filter, but it would catch people that consistently use these terms. Again you would want to get to a rate of usage, not an absolute number. What would you think of a poster who uses the word "sheeple" in 97% of his 432 posts?
* Use of quotes in a post:
I actually have no idea of what this might show; I see many considerate posters using quotes very nicely. On the other hand, I also see quite a few arguments using quotes. But what I don't like is when someone posts something like "That is SO true" in a 10-page thread with no quote - with no context, it's useless (it was probably useless in any case, this is just for illustrative purposes). So perhaps this would be a rate of non-usage of quotes - but I really don't have a prediction here, other than I think it could be useful. Funny thing about analytics, things you think would be useful wind up on the floor quite a bit, and things that you'd think are dumb wind up being powerful... until you actually take a look, you don't know.
* Up-votes per post rate:
So, here we go. While I'm sure people have thought of this before, the devil is in the details. First off, it's probably best to view an up-vote as a sign of popularity, not quality. I've seen (back in the day when down votes were available) high-quality, accurate and timely posts be down voted massively, and inaccurate contentless posts get voted up. But overall I'd wager that posts with higher upvote counts tend to be the better posts.
Now, there are challenges here besides what was already said: older posts still have negative vote totals, while more recent posts can't - so you couldn't mix the populations of posts before Arn's "no down voting" decision was made. Also, posts with zero upvotes can be really horrid posts now, but they'd be regarded as the same as neutral or average posts - meaning a post that was certainly OK but just wasn't interesting or insightful enough to vote up.
Even given that, there's still good data there - but the best measure would be the rate of upvoting per post views. That is, if someone posts something popular in the very beginning of a widely-viewed thread, they're going to get a lot of up votes simply because there are a lot of people viewing that thread. Contrast that to a thread that only had 100 unique readers, yet a post there gets 8 up votes.
I have more, but this is probably overload for most folks. Oh yeah, I'd toss in length of membership into a lot of these as well.
Mind exercise: what would happen to posting behavior if some user titles were replaced with "Blowhard," "Village Idiot," "Know it all," "Self-Proclaimed Legend," "Blah Blah Blah" and the like?
<ducks and runs>