Arachige D (2015) An Analysis of Hominin Cranial Capacity Data Using Simulations. Anthropol 3: 157. doi: 10.4172/2332-0915.1000157
What did the reviews say?
I am really in debt to all the people who spent their time to read the above paper and comment on it. Thank you very much. Many comments were negative. However, I don’t see a strong argument against what my analysis pointed at: a possibility that the cranial capacity data pointing at punctuated equilibrium rather than gradualism. What I set out to do was to investigate the claim by Henneberg et al about the gradualism. What I find is that the gradualism is not supported by their data. My effort was totally directed at being unbiased. At the end I believe there is no evidence in cranial capacity data to send Gould and Co to oblivion. Readers can also see whether the gradualist arguments hold any water. I believe that the readers wouldn’t let a argument to stand in the way of observed data.
I only intend to respond to a few reviews I received on this paper as they are the strongest arguments against my conclusions. In general, I am amused by the tendency of the scientific community to discredit the above paper on precariously weak logic. If the paper were difficult to follow and hence, the misunderstandings arise, I have to totally accept the blame. However,I sometimes wonder whether science at times is about people’s entrenched beliefs rather than the factual evidence. When the undeniable arguments presented on a topic, human nature sometimes tend to rationalise the reasons for ignoring them due to the preconceived ideas about the person presenting them and the accepted opinions about the topic itself. I wonder this had something to do with the opinions expressed below as they are weak arguments against another set of data driven arguments. This opinion does not diminish in any sense the gratitude the author has for the people who made these comments. I only can say a big “Thank you’ to show my sincere gratitude. The unedited comments from the reviewers appear below in bold letters.
I don’t find this paper sufficiently convincing to have me believe it makes an important advance in our understanding of brain size evolution. The author does a novel analysis of the fossil cranial capacity data, and concludes that there is evidence for two periods of change, with the changeover at ~1 MYA.I believe there are enough questions about the author’s statistical analysis that this conclusion is not any more or less convincing than previous studies coming to the same conclusion.
-This is a claim one would not consider sufficiently convincing to dismiss the paper. The review acknowledges the fact that the author used a novel analysis. Then the statistical analysis is dismissed as not convincing. Let us look at the objections to the analysis.
It is hard to follow all the statistical steps the author makes, to be sure they aren’t creating an effect where there isn’t actually one.As one example:
What would happen if you created simulated data that was based solely on the gradual curve that Henneberg et al. give as their best fit for hominin cranial capacities over time, and then did the K-means clustering on that? Would one get the same result (where assuming there are 2 groups gives the biggest drop in within-group SS)? In other words, is k-means making a 2 group solution because of its mathematics, and not necessarily because there actually where 2 groups?
-It is very strange for someone who is wondering whether the author’s conclusions are based on mathematical artefacts, to ask the authour to simulate data based on the Henneberg’s gradual curve and test for groupings. If authour were to simulate gradual change using Henneberg’s model, is the author looking at the real data or the artificial data? Would the conclusions then be based on the reality or mathematical artefacts? To simulate the data one has to make assumptions about the simulations so that the simulations are ‘grounded’ on reality. In the present analysis the author made assumptions about the distributions of the measurements on the cranial capacity of an individual skull. This is more grounded in reality than creating a simulated data sequence across all skull specimens. Should someone do further grouping on artificial data or the observed data? Also note that the paper uses unsupervised learning to identify the two groups while looking at the stability of such groups under simulations.
Secondly, isn’t it the case that any time you split data like this into two parts, and fit curves to these independently, you are going to get a better fit, no matter what? This is because you are able to capitalize on the unique variance characteristics of each segment. Given this, showing that the estimates are better for the two parts is not a good demonstration that there really are two evolutionary trajectories.The author could have tested this by arbitrarily splitting the data at random points (not using his preferred k-means clustering method), and see if the lines estimated for the two sections are also different. If they are (as I suspect they will be), then showing the differences between the k-means clustering derived sections is not particularly convincing (it would be an artefact of ANY split of the data).
-The above is a claim totally against statistical reality. Why do people test single regression line against multiple lines etc for the same data if they always get statistically significant good fit when the data were split? I don’t claim myself to be a brilliant statistician. However, if there is no statistical evidence, a statistician will not accept that the multiple lines are better than a single line. This author did a statistical test to validate this. I am a pragmatist and like to do my statistics on the observed data. This inclination prevents me from opting for tangential excursions into artificial sophistication.Why should someone split the data randomly and test whether splits are different? Note that the author used a kind of a scree diagram which is equivalent to splitting the data from 2 groups to 15 groups. The K- means algorithm is used as an unsupervised learning technique in many areas in statistics including data mining. If this shows the evidence of some group structure, which has some back-up evidence, statisticians don’t go hunting for other groups unless the person is looking for groups with a hidden agenda. Even if this is the case, it is very difficult to justify a series of artificial splits. Any practicing data miner will confirm this. It is strange to suggest artificial splits on one hand and warn about artificial splits on another circumstance.
Thirdly, the actual split is at 1 MYA, not .5 MYA, yet the author focuses on the later.Yes, at that point, using his (possibly artifactual) second curve, the rate of change is greater than earlier. But it is by definition (in his model) part of one ‘process’ defined by that second curve which starts at 1 MYA.So how does this fit with supposed hominin species?And what about the problematic nature of hominin species, which are based on anatomical considerations that are under fairly constant reassessment? This reassessment means that our ability to place a particular fossil in a particular taxon (or even to define the taxa sampled in the first place) is more of a best-guess based on the limited data at any given point in time. I worry that the author is placing too much confidence in this placement of taxa in particular categories.If you look at plots of the actual data points (which the author does not include), it is very hard to believe one can prove that a two-curve model is truly better, particularly if you ignore the tentative species assignments that the specimens have been placed in. Henneberg’s graph of the actual data, as I remember, makes this pretty clear.
-The actual split is one million year according to the data unless the people who collected the data didn’t measure the skulls to make sure that was the case. The author emphasised the last half million years as that part showed a rapid increase. Any model is an artificial fit to the data and agrees with the general trend with ‘errors’ around it. If the authour fits a different model the things may look different. However, this is a model fitted by eminent anthropologist/s and discussed by many people and even accepted by the textbook authours. If the model is good enough to prove that the data follow the Darwinian gradualism why is it not good enough when it can also be used to show that the cranial capacity data does not necessarily follow the gradualism? Even after doing away with the tit-for-tat part of the above point, we are left with an intuitive analysis of same data not supporting the original claim of gradualism..
So I don’t find this paper sufficiently convincing to make it an important advance in our understanding of brain size evolution. People have indeed made the claim that punctuation can be inferred from the fossil cranial capacity data, and others have made the opposite (or have called into question our ability to know given the current data set). This paper applies a new set of analyses that claims to support some sort of change in brain size evolution at ~1 MYA. However, I don’t believe the analysis is strong enough to support that claim.
– The argument I posited in my paper was that the original authours didn’t have strong evidence to reject the punctuated equilibrium. I guess there should be something more than a personal belief to reject the claim made in the paper about non-gradual increase of cranial capacity.
This manuscript attempts to apply simulation methods to the problem of analyzing the evolutionary history of hominin cranial capacity. The methods used do not strike me as particularly appropriate to the data. For example, the use of k-means on what are essentially time series data is extremely unusual. I have not searched the literature for whether there is a previous application, but I highly doubt that one could find any kind of cluster analysis where one of the variables is time. The notation, or at least the explanation of it, used in the manuscript is also poor. For example, the equation for Hamming’s distance and its explanation does not seem correct. It can’t be that we are looking for cases that are in a particular cluster five _and only_ five times or 995 and _and only_ 995 times, correct? But this is a really moot, because as I said, cluster analysis for time series data does not make much sense.
– This is another claim, which sounds weak. A technique not being used on time series doesn’t mean that the use of it in this instance is not logical. The cranial capacity data does not represent a true time series. If it were, time series wizards world over would have developed with hundreds of models to predict future cranial capacity. The techniques in the paper were used not on the raw data as every statistician knows the clustering is usually done on a distance measure and in this case, Euclidean distance. As the time gaps and the cranial measurements do not follow a linear trend, until a trend was imposed using a model, the interrelationships between individual specimens and their time of existence can be looked at using the distances on two dimensions. This may not be ideal, as there would be some influence of time trend. But it is not easy to see the reason for the two groups solution. With regard to the Hamming Distance, I only ask the reader to be the judge. What I do at the end is adding up string of 1 and 0 to get the distance. And I was not looking at cluster five or 995 only. What I did was to find 25% and 975% cluster solutions out of the total simulated and if they are the same then the solution is more stable.
The description on page 6 of the simulation is a bit hard to follow. “R” has both rlnorm() which simulates “raw scale” random deviates given a log mean and log standard deviation and rnorm() which could be used to simulate log scale deviates given a log mean and log standard deviation. Simplest would be to convert cranial capacities to log scale “up front” and then find the necessary statistics (specimen means and sd where available) in the log scale. And simulate in the log scale using rnorm(). Is that what was done?
-I don’t disagree with the above review. This was in fact what was done in the following manner to give one example.
summarydat4<-ddply(summarydat3, .(id,time) , transform, value=(rlnorm(1,mean,sd)))
Any one interested in duplicating what the author has done are more than welcome to drop in a note requesting the underlying data.