# Maths genius need only apply

Discussion in 'Community Discussion' started by stridemat, Mar 4, 2010.

### Staff Member

Joined:
Apr 2, 2008
Location:
UK
#1
Ok I need someone with more brain power (or some knowledge of statistics)

I have two sets of related data. Set A was carried out in 2008 and had 800 respondents. Set B was carried out in 2010 and had 40 respondents. The questions used in both sets are the same. Obviously Set B has a smaller response rate.

Is there a mathematical way of comparing the validity of Set B results to Set A ?

Hope that makes sense, and I hope I given enough information.

2. ### mkrishnan Moderator emeritus

Joined:
Jan 9, 2004
Location:
Grand Rapids, MI, USA
#2
How do you define validity? That's not really a mathematical or statistical concept.

Do you mean...
- Do the two data sets measure some latent construct equally well
- Are the mean responses as reliable in one case as the other?
....

3. ### SilentPanda Moderator emeritus

Joined:
Oct 8, 2002
Location:
The Bamboo Forest
#3

### Staff Member

Joined:
Apr 2, 2008
Location:
UK
#4
Rereading my original post I could explain myself better.

How could you compare the results gained in Set B to Set A even though the sample sizes are very different. Is there a way of calculating what the results of Set B would be if the sample size were as A.

I know you could just use percentages, but Im hoping something more 'scientific' and statistical is possible.

5. ### bartelby macrumors Core

Joined:
Jun 16, 2004
#5

For a start Set B is far too small to even bother using the data.

You need to look into confidence intervals...

6. ### mkrishnan Moderator emeritus

Joined:
Jan 9, 2004
Location:
Grand Rapids, MI, USA
#6
^^ That really depends on the expected effect sizes. For large effect sizes, 40 can be a very large sample.

You have a few different options... I don't think mathematics will be a straight guide to which one to use, though. You're going to have to explain what you're doing in more detail.

Are the people who responded to Set A (a superset of) the same people who responded to Set B, are they different people who are supposed to represent the same underlying population, or are they a different group of people representing a different population?

7. ### bartelby macrumors Core

Joined:
Jun 16, 2004
#7
When comparing to 800?

It's not my answer anyway. It's my wife's. She assess whether statistics can be used for official UK Government figures.

I figured she'd be a good person to ask.

8. ### RedTomato macrumors 68040

Joined:
Mar 4, 2005
Location:
.. London ..
#8
I can't see how Benford's Law is relevant here. You're not looking for signs of fraud and made-up data, which is the main application of Benford's Law.

As per the above replies, you'd be looking at confidence intervals and significant deviations and chi-square tests. It would also be good to know how many people were polled - if 5000 people were polled in both studies, and 800 responded the first time, and only 40 the second time, then obviously something significant has changed.

This isn't math genius work - I did chi-squared tests as part of standard school work when I was 17.

EDIT: Mkrishnan just made some good points too.

### Staff Member

Joined:
Apr 2, 2008
Location:
UK
#9
I will start from the start (always a good place)

I am currently carrying out research into how peoples perceptions of sustainable construction have changed due to the recession that the UK is currently experiencing. In order to do this I have found data from 2008 (before the recession) on peoples perceptions of sustainable construction (Set A). I have conducted research using the same series of questions, on a different set of respondents in 2010 (Set B).

The original study received 800 respondents, mine has only received 40 (My tutor says this is fine for undergraduate study). Whats the best way of comparing the results I obtained to previous results?

I have looked at confidence levels and could see how they would be applied, would this work in this instance?

edit*
I think chi-square test are what Im looking for.

10. ### flopticalcube macrumors G4

Joined:
Sep 7, 2006
Location:
In the velcro closure of America's Hat
#10
Have you tried a sample size calculator: http://www.surveysystem.com/sscalc.htm

A sample size of about 100 can give 95% confidence with a 10% CI for even large populations. 40 seems a bit small to draw any firm conclusions.

11. ### barr08 macrumors 65816

Joined:
Aug 9, 2006
Location:
Boston, MA
#11
EDIT: not sure if this helps at all, or if it's even what you are talking about, but it was fun to write! This is probably much more simple an answer than the one you are looking for.

I am very far from a math genius, but if you don't mind being sneaky you could just show percentages:

Say, for example, your poll was 'do you prefer A or B?'

In the first poll, 500 out of 800 people chose A. This equals 500/800, or 62.5%.

In the second poll, 17 out of 40 people chose A. This equals 17/40, or 42.5%.

You could technically show the following results:

In 2008, 62.5% of people preferred A to B.
But in 2010, only 42.5% of people preferred A to B.
Or you could say 57.5% preferred B to A.

If you wanted to expand B's results to match 800, you could just cross multiply:

17/40 = x/800
x = 340

So, by the numbers you found in 2010, you can 'guess' that 340 people would have chose A if a full 800 were polled.

This is how a simple, non-math person looks at it at least. You are technically lying, but what else are statistics used for anyways?

### Staff Member

Joined:
Apr 2, 2008
Location:
UK
#12
I have already had a look at a confidence calculator and I am going to have a better look when I get back home.

Thats my fall back plan

13. ### mkrishnan Moderator emeritus

Joined:
Jan 9, 2004
Location:
Grand Rapids, MI, USA
#13
Again, it really depends on what is being measured. For the kinds of things done in government studies, no, the effect sizes are small, and 40 people will rarely be a good enough sample. Also, 40 people is not a good sample, because the intent is to use that sample to represent the population of a city, province, or country, and there is too much underlying heterogeneity to use 40 people to make a meaningful inference. Neuroscience studies would be a common example of a situation where the supposed underlying heterogeneity is smaller, and the effect sizes are much larger. So, if you look at Nature Neuroscience or another journal like that, for instance, you'd see that 40 people would usually be considered a moderate to large sample size, statistically...

This is why I asked what was being measured, but since it seems that it's the kind of thing that will have low effect sizes...

There are a few options you might consider... at a basic level, things like the chi square tests and confidence intervals above are fine. Another option that's more complex, and may or may not be what you want to do, is to take the 800 individuals in the first study, and instead of asking whether your respondents differ from all 800 of them, pick a subset of them who are most "like" your respondents -- e.g., matched for their age, ethnicity, income level, education level, geography, etc. -- and then see if the matched individuals differ from your 40 individuals. The advantage of this is that you can, to some extent, control for these various influences that might be in contention with your proposed mechanism of change (the recession). There's a lot of good (and free) software to do that. <R> has a package for it.

### Staff Member

Joined:
Apr 2, 2008
Location:
UK
#14
Thanks for all your answers, I have a lot of research on this ahead of me. The chi-squares look promising, Just got to remember how to work them out, it's been along time.

15. ### Don't panic macrumors 603

Joined:
Jan 30, 2004
Location:
having a drink at Milliways
#15
i agree on the chi-square for this situation.

what you likely won't be able to do is correlate answers, if the test had multiple questions.

16. ### Rodimus Prime macrumors G4

Joined:
Oct 9, 2006
#16
you can compare the results. Just break them down into into a percetages and you can compare them directly and since group B is larger than 30 no need to mess with the small sample size conversion.

It has been a while since I have taken statistics but your problem is a basic statistics problem. Now I do not remember how do what you need to have done but I do know it can be done and I remember doing it multiple times during my statistics classes.