TimG Posted October 1, 2008 Report Share Posted October 1, 2008 The Wikipedia article regarding "median" includes this: "If there is an even number of observations, the median is not unique, so one often takes the mean of the two middle values." I have tracked down a few examples of calculating the median of an even number of observations and each time the mean of the two middle values is used. But, is this the only median if a strict mathematical definition of median is used? Quote Link to comment Share on other sites More sharing options...
MickyB Posted October 1, 2008 Report Share Posted October 1, 2008 Not sure, but it seems rather arbitrary to take the arithmetic mean of the two middle values, rather than the geometric mean. Quote Link to comment Share on other sites More sharing options...
cherdano Posted October 1, 2008 Report Share Posted October 1, 2008 That depends on the strict mathematical definition that you use. I doubt the wikipedia definition would be the consensus.Anyway, how can this question matter? Quote Link to comment Share on other sites More sharing options...
TimG Posted October 1, 2008 Author Report Share Posted October 1, 2008 Anyway, how can this question matter? Sometimes it's good to know the answer just for the sake of knowing the answer. Quote Link to comment Share on other sites More sharing options...
han Posted October 1, 2008 Report Share Posted October 1, 2008 In his infinite wisdom, god decided to define the median only for an odd number of observations. Quote Link to comment Share on other sites More sharing options...
matmat Posted October 1, 2008 Report Share Posted October 1, 2008 i thought the median was the little barrier between the two sides of the road. what do i know... Quote Link to comment Share on other sites More sharing options...
helene_t Posted October 1, 2008 Report Share Posted October 1, 2008 The arithmetic mean may be arbitrary but it is arguably the most "natural" choice. For some purposes something else may be more natural, for example the geometric mean. But the geometric mean is not defined if one of the two values is zero, or if one is positive and the other negative. So it doesn't surprise me that "all" software uses the arithmetic mean. The median is not so useful for small data sets, and for large data sets it doesn't matter much. Quote Link to comment Share on other sites More sharing options...
TimG Posted October 1, 2008 Author Report Share Posted October 1, 2008 i thought the median was the little barrier between the two sides of the road. what do i know... You know what my 12 year-old daughter knows -- honestly, she looked at my 10 year-old son like he was crazy when he said it had something to do with math. Said 10 year-old son was told by his 5th grade teacher that the median is determined by averaging the middle two elements when there are an even number of elements. Nothing really wrong with him following the prescribed procedure, but I want to know if it would be wrong to call one of the two middle elements a median (or any other value between the two middle elements). Quote Link to comment Share on other sites More sharing options...
helene_t Posted October 1, 2008 Report Share Posted October 1, 2008 Just noticed a funny thing: R gives a number ending on .25 as 25-percentile of a set of 2*(odd number) integers, and a number ending on .75 for 2*(even number) integers. The converse with the 75 percentile. Is this standard? Wikipedia disagrees, it says the ranks should be rounded off so that the quartiles produce numbers that actually occur in the data set. It seems inconsequent that the median is the mean of the two middle observations while the other quartiles are picked among the actual observations. Quote Link to comment Share on other sites More sharing options...
kenberg Posted October 1, 2008 Report Share Posted October 1, 2008 I have seen, although I don't immediately have a reference, the median for an even number of data points defined as the interval [a,b] where a is the largest number in the smaller half and b is the smallest number in the larger half. [edit: In Probability and Statistic, 2nd ed, pp207-208, DeGroot essentially sets up the definitions as I describe. In his terms every number in the interval is called a median, more or less the same as saying that the interval is the median.] Thus the median of 1,3,,7,13, 15, 20 is the interval [7,13] (or in DeGroot all numbers from 7 thorough 13 are medians). When deciding on such things it sometimes helps to look at what sort of things you might want to be true. For example, suppose that we have an odd number of positive data points x_1,x_2, ... with a median value m. Suppose now we square those numbers, y=x^2, getting data points y_1,y_2 ... . The median of this new data set will be m^2. TIf we have an even number of data points x_i with median interval [a,b] and do the same maneuver, the median of the y_i would be the interval [a^2,b^2]. he arithmetic mean definition would not have this nice feature. It's not just with squares, any increasing transformation x->y would work that way. We don't really have separate definitions in the even and odd case this way, with an odd number of data points the interval becomes a one point interval [m,m]. In general, mathematics is governed by what might be called conditional practicality. Once you accept the idea that doing mathematics is a sensible activity, many of the definitions are practical attempts to pursue that activity. Quote Link to comment Share on other sites More sharing options...
Elianna Posted October 1, 2008 Report Share Posted October 1, 2008 For example, suppose that we have an odd number of positive data points x_1,x_2, ... with a median value m. Suppose now we square those numbers, y=x^2, getting data points y_1,y_2 ... . The median of this new data set will be m^2. TIf we have an even number of data points x_i with median interval [a,b] and do the same maneuver, the median of the y_i would be the interval [a^2,b^2]. Isn't that only true if the x_i's are positive? Quote Link to comment Share on other sites More sharing options...
cherdano Posted October 1, 2008 Report Share Posted October 1, 2008 i thought the median was the little barrier between the two sides of the road. what do i know... You know what my 12 year-old daughter knows -- honestly, she looked at my 10 year-old son like he was crazy when he said it had something to do with math. Said 10 year-old son was told by his 5th grade teacher that the median is determined by averaging the middle two elements when there are an even number of elements. Nothing really wrong with him following the prescribed procedure, but I want to know if it would be wrong to call one of the two middle elements a median (or any other value between the two middle elements). Math course books at any level make many similar decisions to pick one definition among several plausible ones. (Well, actual math books too, for that matter.)In fact, every time I teach calculus I teach some much more questionable definitions. Quote Link to comment Share on other sites More sharing options...
matmat Posted October 1, 2008 Report Share Posted October 1, 2008 i thought the median was the little barrier between the two sides of the road. what do i know... You know what my 12 year-old daughter knows -- honestly, she looked at my 10 year-old son like he was crazy when he said it had something to do with math. Said 10 year-old son was told by his 5th grade teacher that the median is determined by averaging the middle two elements when there are an even number of elements. Nothing really wrong with him following the prescribed procedure, but I want to know if it would be wrong to call one of the two middle elements a median (or any other value between the two middle elements). so your median child is 11yo? Quote Link to comment Share on other sites More sharing options...
jdonn Posted October 1, 2008 Report Share Posted October 1, 2008 i thought the median was the little barrier between the two sides of the road. what do i know... You know what my 12 year-old daughter knows -- honestly, she looked at my 10 year-old son like he was crazy when he said it had something to do with math. Said 10 year-old son was told by his 5th grade teacher that the median is determined by averaging the middle two elements when there are an even number of elements. Nothing really wrong with him following the prescribed procedure, but I want to know if it would be wrong to call one of the two middle elements a median (or any other value between the two middle elements). so your median child is 11yo? I don't think you have a median child. Your childrens' median age is 11. I learned it the way wikipedia defines it. Quote Link to comment Share on other sites More sharing options...
matmat Posted October 1, 2008 Report Share Posted October 1, 2008 I don't think you have a median child. Your childrens' median age is 11. I learned it the way wikipedia defines it. i don't believe i ever learned this, and I don't EVER recall having to use the median. I understand the purpose for the means of various sorts, and the mode, but i could never figure out what the point of a median was, except to provide additional useless crap to put on the SAT Quote Link to comment Share on other sites More sharing options...
Roupoil Posted October 1, 2008 Report Share Posted October 1, 2008 Well, it depends on your definition of the median. The "most mathematical" définition is to say that the median is a number which minimizes the sum of the (absolute) distances to all the numbers of your list (and the arithmetic mean to be the one which minimizes the sum of the squares of these distances), in which case all the interval between the two middle numbers is a median, so why not take the middle of this interval ? But there are indeed people who want the median to be one of the numbers of the list, and usually pick the smaller one. A convenient definition for this median is the smallest number for which the cumulate frequency reachs 0.5. PS : Sorry for the maybe bad translation from French of some of the mathematical words... Quote Link to comment Share on other sites More sharing options...
Echognome Posted October 1, 2008 Report Share Posted October 1, 2008 The median is not so useful for small data sets, and for large data sets it doesn't matter much.Not so sure what you mean by this. My understanding is that the median is a more robust statistic than the mean. It's been awhile, but I thought the point was that the median is robust to, for example, measurement error. Suppose you had the following data from a poll that was taken: 1, 1, 2, 2, 2, 3, 3, 4, 40 Now the median would be 2 and the mean would be ~6.4. Is 40 a true value or maybe one of the poll takers wrote down the wrong answer? Another related topic is interquartile range. I was surprised to find when I started working that the IRS definition and the Quartile function on Excel use different definitions. Personally, I side with the IRS on the matter. Interquartile range. For purposes of this section, the interquartile range is the range from the 25th to the 75th percentile of the results derived from the uncontrolled comparables. For this purpose, the 25th percentile is the lowest result derived from an uncontrolled comparable such that at least 25 percent of the results are at or below the value of that result. However, if exactly 25 percent of the results are at or below a result, then the 25th percentile is equal to the average of that result and the next higher result derived from the uncontrolled comparables. The 75th percentile is determined analogously. The following is the algorithm used to calculate QUARTILE(): 1. Find the kth smallest member in the array of values, where:k=(quart/4)*(n-1))+1 If k is not an integer, truncate it but store the fractional portion (f) for use in step 3. And where: • • quart = value between 0 and 4 depending on which quartile you want to find • n = number of values in the array 2. Find the smallest data point in the array of values that is greater than the kth smallest -- the (k+1)th smallest member. 3. Interpolate between the kth smallest and the (k+1)th smallest values:Output = a[k]+(f*(a[k+1]-a[k])) a[k] = the kth smallesta[k+1] = the k+1th smallest Quote Link to comment Share on other sites More sharing options...
helene_t Posted October 1, 2008 Report Share Posted October 1, 2008 I don't think you have a median child. Your childrens' median age is 11. I learned it the way wikipedia defines it. i don't believe i ever learned this, and I don't EVER recall having to use the median. I understand the purpose for the means of various sorts, and the mode, but i could never figure out what the point of a median was, except to provide additional useless crap to put on the SAT I find medians quite useful. They are much more robust than means (as Gnome describes above). And they make sense for ordered sets while means only make sense for real numbers (or vectors). Modes, on the other hand, I never found a use for, but that is no doubt just related to the limited scope of my work. Gnome: the reason why I think the median is not so useful for small data sets is that for normal distributed data it converges more slowly, i.e. for a given sample set you have a wider confidence interval for the median than for the mean. Quote Link to comment Share on other sites More sharing options...
jdonn Posted October 1, 2008 Report Share Posted October 1, 2008 With medians and modes there are plenty of practical uses, I just never hear them called medians and modes. For example, if you want half the pairs from the first day of the Life Master Pairs advance to the second day, you find the median matchpoint total, and all pairs who got at least that many will advance. It's just that if you told the players 'you advance with more matchpoints than the median' they would stare at you and drool on the floor. Quote Link to comment Share on other sites More sharing options...
Mosene Posted October 1, 2008 Report Share Posted October 1, 2008 A median can be quite useful when using large data sets that are skewed (often because bounded by zero) rather than normally distributed. A notable use of the median in social science settings is national income data - which is not normally distributed. Using a mean for income statistics distorts the situation of most citizens because those few making large incomes skews the data - upwards of course. That is a short explanation anyway. Quote Link to comment Share on other sites More sharing options...
barmar Posted October 2, 2008 Report Share Posted October 2, 2008 Modes, on the other hand, I never found a use for, but that is no doubt just related to the limited scope of my work. But it comes up in duplicate bridge. The "field result" is probably the mode. Quote Link to comment Share on other sites More sharing options...
barmar Posted October 2, 2008 Report Share Posted October 2, 2008 We have software at my company that makes use of median, and this issue has recently come up. We do network server load balancing and failover, and we have monitoring systems that test the servers to see if they're alive and responding in a reasonable time. 6-8 systems perform a test, and we use the median response time, arbitrarily assigning a very large number (75.0) if the test fails. It's not uncommon for the data set to look something like: 0.01 0.01 0.02 0.03 0.5 0.6 75.0 75.0 If the mean is below 4, the server is alive, otherwise it's considered down (it's actually more complicated than that, but the details aren't important). We're phasing in a new version of the software. The old version always used the lower of the two middle values when there were an even number, the new one uses the mean of them. It's not too unusual for exactly half the values to be the 75.0 penalty value, while the others are tiny (because customers sometimes fail to update their firewalls when we announce new testing IPs). When this happens, the old version would declare it up, but the new one will calculate median = 37.x, so it's considered down. We've decided not to fix this incompatibility. If 4 test machines are failing, it could just as easily have been 5, so it's an arbitrary distinction to make. The primary purpose of using the median is to throw out real outliers, like in Gnome's example. If the data set is split half and half between two types of values, neither one can really be considered outlying. Quote Link to comment Share on other sites More sharing options...
hrothgar Posted October 2, 2008 Report Share Posted October 2, 2008 The set of assumptions required to provide that the mean is the Best Linear Unbiased Estimator (BLUE) is fairly restrictive. You often need to break out medians and IQR's to avoid using a (potentially) biased estimator. I used to run into this issue all the time studying packet delays on TCP/IP networks... Quote Link to comment Share on other sites More sharing options...
kenberg Posted October 2, 2008 Report Share Posted October 2, 2008 For example, suppose that we have an odd number of positive data points x_1,x_2, ... with a median value m. Suppose now we square those numbers, y=x^2, getting data points y_1,y_2 ... . The median of this new data set will be m^2. TIf we have an even number of data points x_i with median interval [a,b] and do the same maneuver, the median of the y_i would be the interval [a^2,b^2]. Isn't that only true if the x_i's are positive? yes. I said so I think. Quote Link to comment Share on other sites More sharing options...
kenberg Posted October 2, 2008 Report Share Posted October 2, 2008 Well, it depends on your definition of the median. The "most mathematical" définition is to say that the median is a number which minimizes the sum of the (absolute) distances to all the numbers of your list (and the arithmetic mean to be the one which minimizes the sum of the squares of these distances), in which case all the interval between the two middle numbers is a median, so why not take the middle of this interval ? But there are indeed people who want the median to be one of the numbers of the list, and usually pick the smaller one. A convenient definition for this median is the smallest number for which the cumulate frequency reachs 0.5. PS : Sorry for the maybe bad translation from French of some of the mathematical words... I like this definition of the median a lot. Thanks, I had never thought of this characterization before. It displays the median as a special case of a higher dimensional problem. And it's another way of presenting the median of an even number of data points as an interval rather than a specific number (Any point in [5,7] will minimize the indicated sum if the data points are 1,5,7,12). And it nicely relates to the definition of the mean. All in all, a neat definition. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.