Maths genius need only apply

stridemat · Mar 4, 2010

Ok I need someone with more brain power (or some knowledge of statistics)

I have two sets of related data. Set A was carried out in 2008 and had 800 respondents. Set B was carried out in 2010 and had 40 respondents. The questions used in both sets are the same. Obviously Set B has a smaller response rate.

Is there a mathematical way of comparing the validity of Set B results to Set A ?

Hope that makes sense, and I hope I given enough information.

mkrishnan · Mar 4, 2010

How do you define validity? That's not really a mathematical or statistical concept.

Do you mean...
- Do the two data sets measure some latent construct equally well
- Are the mean responses as reliable in one case as the other?
....

SilentPanda · Mar 4, 2010

Not sure if this will help... but depending on the data it might at least give you a feel good feeling...

http://mathworld.wolfram.com/BenfordsLaw.html

stridemat · Mar 4, 2010

Rereading my original post I could explain myself better.

How could you compare the results gained in Set B to Set A even though the sample sizes are very different. Is there a way of calculating what the results of Set B would be if the sample size were as A.

I know you could just use percentages, but Im hoping something more 'scientific' and statistical is possible.

bartelby · Mar 4, 2010

stridemat said:
Rereading my original post I could explain myself better.

How could you compare the results gained in Set B to Set A even though the sample sizes are very different. Is there a way of calculating what the results of Set B would be if the sample size were as A.

I know you could just use percentages, but Im hoping something more 'scientific' and statistical is possible.

For a start Set B is far too small to even bother using the data.

You need to look into confidence intervals...

mkrishnan · Mar 4, 2010

^^ That really depends on the expected effect sizes. For large effect sizes, 40 can be a very large sample.

stridemat said:
How could you compare the results gained in Set B to Set A even though the sample sizes are very different. Is there a way of calculating what the results of Set B would be if the sample size were as A.

You have a few different options... I don't think mathematics will be a straight guide to which one to use, though. You're going to have to explain what you're doing in more detail.

Are the people who responded to Set A (a superset of) the same people who responded to Set B, are they different people who are supposed to represent the same underlying population, or are they a different group of people representing a different population?

bartelby · Mar 4, 2010

mkrishnan said:
^^ That really depends on the expected effect sizes. For large effect sizes, 40 can be a very large sample.

When comparing to 800?

It's not my answer anyway. It's my wife's. She assess whether statistics can be used for official UK Government figures.

I figured she'd be a good person to ask.

RedTomato · Mar 4, 2010

I can't see how Benford's Law is relevant here. You're not looking for signs of fraud and made-up data, which is the main application of Benford's Law.

As per the above replies, you'd be looking at confidence intervals and significant deviations and chi-square tests. It would also be good to know how many people were polled - if 5000 people were polled in both studies, and 800 responded the first time, and only 40 the second time, then obviously something significant has changed.

This isn't math genius work - I did chi-squared tests as part of standard school work when I was 17.

EDIT: Mkrishnan just made some good points too.

stridemat · Mar 4, 2010

I will start from the start (always a good place)

I am currently carrying out research into how peoples perceptions of sustainable construction have changed due to the recession that the UK is currently experiencing. In order to do this I have found data from 2008 (before the recession) on peoples perceptions of sustainable construction (Set A). I have conducted research using the same series of questions, on a different set of respondents in 2010 (Set B).

The original study received 800 respondents, mine has only received 40 (My tutor says this is fine for undergraduate study). Whats the best way of comparing the results I obtained to previous results?

I have looked at confidence levels and could see how they would be applied, would this work in this instance?

edit*

confidence intervals and significant deviations and chi-square tests

I think chi-square test are what Im looking for.

flopticalcube · Mar 4, 2010

Have you tried a sample size calculator: http://www.surveysystem.com/sscalc.htm

A sample size of about 100 can give 95% confidence with a 10% CI for even large populations. 40 seems a bit small to draw any firm conclusions.

barr08 · Mar 4, 2010

EDIT: not sure if this helps at all, or if it's even what you are talking about, but it was fun to write! This is probably much more simple an answer than the one you are looking for.

I am very far from a math genius, but if you don't mind being sneaky you could just show percentages:

Say, for example, your poll was 'do you prefer A or B?'

In the first poll, 500 out of 800 people chose A. This equals 500/800, or 62.5%.

In the second poll, 17 out of 40 people chose A. This equals 17/40, or 42.5%.

You could technically show the following results:

In 2008, 62.5% of people preferred A to B.
But in 2010, only 42.5% of people preferred A to B.
Or you could say 57.5% preferred B to A.

If you wanted to expand B's results to match 800, you could just cross multiply:

17/40 = x/800
x = 340

So, by the numbers you found in 2010, you can 'guess' that 340 people would have chose A if a full 800 were polled.

This is how a simple, non-math person looks at it at least. You are technically lying, but what else are statistics used for anyways? 😀

stridemat · Mar 4, 2010

flopticalcube said:
Have you tried a sample size calculator: http://www.surveysystem.com/sscalc.htm

A sample size of about 100 can give 95% confidence with a 10% CI for even large populations. 40 seems a bit small to draw any firm conclusions.

I have already had a look at a confidence calculator and I am going to have a better look when I get back home.

barr08 said:
This is how a non-math person looks at it at least. You are technically lying, but what else are statistics used for anyways? 😀

Thats my fall back plan 😛

mkrishnan · Mar 4, 2010

bartelby said:
When comparing to 800?

It's not my answer anyway. It's my wife's. She assess whether statistics can be used for official UK Government figures.

I figured she'd be a good person to ask.

Again, it really depends on what is being measured. For the kinds of things done in government studies, no, the effect sizes are small, and 40 people will rarely be a good enough sample. Also, 40 people is not a good sample, because the intent is to use that sample to represent the population of a city, province, or country, and there is too much underlying heterogeneity to use 40 people to make a meaningful inference. Neuroscience studies would be a common example of a situation where the supposed underlying heterogeneity is smaller, and the effect sizes are much larger. So, if you look at Nature Neuroscience or another journal like that, for instance, you'd see that 40 people would usually be considered a moderate to large sample size, statistically...

This is why I asked what was being measured, but since it seems that it's the kind of thing that will have low effect sizes...

stridemat said:
I am currently carrying out research into how peoples perceptions of sustainable construction have changed due to the recession that the UK is currently experiencing. In order to do this I have found data from 2008 (before the recession) on peoples perceptions of sustainable construction (Set A). I have conducted research using the same series of questions, on a different set of respondents in 2010 (Set B).

There are a few options you might consider... at a basic level, things like the chi square tests and confidence intervals above are fine. Another option that's more complex, and may or may not be what you want to do, is to take the 800 individuals in the first study, and instead of asking whether your respondents differ from all 800 of them, pick a subset of them who are most "like" your respondents -- e.g., matched for their age, ethnicity, income level, education level, geography, etc. -- and then see if the matched individuals differ from your 40 individuals. The advantage of this is that you can, to some extent, control for these various influences that might be in contention with your proposed mechanism of change (the recession). There's a lot of good (and free) software to do that. <R> has a package for it.

stridemat · Mar 4, 2010

Thanks for all your answers, I have a lot of research on this ahead of me. The chi-squares look promising, Just got to remember how to work them out, it's been along time.

Don't panic · Mar 4, 2010

i agree on the chi-square for this situation.

what you likely won't be able to do is correlate answers, if the test had multiple questions.

Rodimus Prime · Mar 4, 2010

you can compare the results. Just break them down into into a percetages and you can compare them directly and since group B is larger than 30 no need to mess with the small sample size conversion.

It has been a while since I have taken statistics but your problem is a basic statistics problem. Now I do not remember how do what you need to have done but I do know it can be done and I remember doing it multiple times during my statistics classes.

Search

Search

Maths genius need only apply

stridemat

Moderator

mkrishnan

Moderator emeritus

SilentPanda

Moderator emeritus

stridemat

Moderator

bartelby

macrumors Core

mkrishnan

Moderator emeritus

bartelby

macrumors Core

RedTomato

macrumors 601

stridemat

Moderator

flopticalcube

macrumors G4

barr08

macrumors 65816

stridemat

Moderator

mkrishnan

Moderator emeritus

stridemat

Moderator

Don't panic

macrumors 603

Rodimus Prime

macrumors G4

Our Staff