Jun 19, 2010

Why should scientific papers be "spatially enabled"

Now that I’m starting to build the databases needed for my new lithological database, I’m coming back to how I created my Devonian database. The papers I generally worked with contained reports from the field, including lithology, measurements, location, etc. That can be a LOT of information. Collecting it all from each paper is time consuming to say the least. Howevever, there was another problem…

That problem is being overly focused on the data in front of you and not the data you need. The forest for the trees problem, if you will. In the earth sciences, there are a number of research biases. North America and Europe are far better studied than Africa, for example. Thus, most publications are focused in those regions. Similarly, some specific localities can be studied extensively, because of location or because of something interesting, while others are rarely visited. This becomes a problem when you keep entering papers from the same area but miss important work from more rarely studied areas.

To combat this problem for the Devonian database, I created a “recon” or “search” database. I tried to find any paper that might be relevant to the project and collect some basic information such as time range, and the general lat/lon area of the field study. I could then map these records in a GIS application (at the time, I was using MapInfo, Terra Mobilis, and PGIS).

As an example, I found about 500 of these records remaining in my archives. Here is a global map example:

The yellow dots are entries in the Devonian Lithological Database. The blue rectangles are “coverages” for particular scientific papers. Where papers overlap, the blue color gets darker. This is more evident regionally, for example:

As you can see, I can now show the data I have versus the field areas represented by papers I’ve found. Careful examination of this sort of map highlights both papers I might not need to bother with (blue rectangles with lots of yellow dots) versus papers I should prioritize (blue rectangles with few if any yellow dots).

These maps by no means represents all the papers I looked at in developing the database. I think I physically at least looked at 3000-4000 papers but only 500 are represented in the above maps. So, to include everything, it would take a great deal of work.

In any case, in this short example, I hope i’ve shown that in at least once case that geospatially enabled papers can be very important. Now, the question is how to implement it!