Let’s talk about soil…

Here at Ag-Informatics, we spend a lot of time thinking about soil. Soil is a crucial component in many of our products and analyses. Obviously soil characteristics are important for assessing the quality of a field and potential crop yields. However, there are numerous other applications as well, especially when combined with our weather products such as fieldwork analysis. As a grower with this information, you will know when it’s appropriate to drive through your field based on a combination of environmental variables and the properties of the specific soils in your field. More on this and other soil topics to come, but for now let’s explore the soil data itself.

Spatial soil data as a whole can be broken down into three parts, the map unit, the component, and the horizon. Although these names may differ between datasets, the relationship is the same. A map unit is a soil pattern repeating across a contiguous area delineated as a polygon. Each map unit is assigned a unique identifier so that other information can be joined to this spatial representation. Each map unit is comprised of one or more components, which are unmapped and are found to constitute some portion of the map unit based on the soil survey. The components are assigned a percentage reflecting their share of the total map unit size. A component might be a major component of the map unit or a minor component, known as an inclusion. Each component is also represented by a unique identifier. Lastly, each component is comprised of multiple horizons. Horizons are equivalent to a cross-section of the soil component, much like a slice of layer cake. Horizons are assigned a depth or sequence value so that is can be determined in which order the horizons are arranged, for example if the horizon represents top soil and sub soil.

This hand-drawn image depicts a map unit cross-section (although in reality the components are unmapped).

Soil data, at a minimum, provides physical and chemical properties of the soil. Physical properties would include the percent composition of sand, silt, and clay (ultimately describing texture), gravel percentage, depth, and water storage. Chemical properties would include Cation-Exchange Capacity (CEC), pH, organic carbon, and Calcium Carbonate content.

Because of the one too many relationship within the data, a method of aggregation must be chosen. However, the method will depend on the specific needs of the analysis. For aggregating components to map units, values are often weighted by the component percentage. For horizon aggregation, values could be depth-weighted, summed, or the top horizon only may be selected.

Because every good analysis begins with good data, we are fortunate in that respect to make use of two excellent datasets, the Soil Survey Geographic Database (SSURGO) and the Harmonized World Soil Database (HWSD).

When working in the US, we use SSURGO, which is produced by the Natural Resources Conservation Service (NRCS) of the United States Department of Agriculture (USDA). This dataset is an ongoing effort that is the result of thousands of soil surveys over the course of many years. There are still some areas that have yet to be surveyed, but most areas of agricultural significance are complete. The data have been collected at scales ranging from 1:12,000 to 1:63,360, resulting is detail such as this.

SSURGO is truly a great dataset which provides a wealth of information beyond the physical and chemical properties mentioned earlier. SSURGO provides information describing the engineering properties of the soil, as well as the suitability for different types of activity, including forestry and agriculture. For agriculture, the dataset even provides some estimated yields for crops common to the area.

We have obtained the complete SSURGO dataset (around 250 gigabytes zipped) from the NRCS and are actively working on importing the entire dataset into PostgreSQL taking advantage of every opportunity to maximize performance. Our goal is to always have the most recent version of the dataset available for use in our analyses and products in a way that allows us to extract the most commonly needed information as quickly and easily as possible. Anyone who has worked with SSURGO in the past knows that this is not always the simplest task.

When working globally, we often use HWSD as a starting point. HWSD is provided by the Food and Agriculture Organization of the United Nations (FAO) and the International Institute for Applied Systems Analysis (IIASA).  As the name implies, this dataset is actually a compilation of four regional and global soil datasets placed into a common schema. HWSD has been compiled at a resolution of 1 kilometer, making it much coarser than SSURGO, but still quite useful in its own right.

As a result of its coarseness, some of the additional properties that are present in SSURGO are not available in the base HWSD. Although less detailed, having access to a consistent global product is invaluable. One clear advantage of HWSD is its ease of use. The dataset is much smaller in size, and is also delivered as a single global table with attributes (still requiring aggregation) rather than over 50 attribute tables delivered per soil survey region in SSURGO that must be merged and then joined before finally being aggregated for an area of interest.

I mentioned earlier that HWSD is actually a starting point for us. We have developed a process by which we can include additional data and perform considerable transformations on the base dataset by incorporating ancillary datasets, arriving at a product we refer to as HWSD-Plus. HWSD-Plus becomes significantly more useful for use in our analyses and products than the base dataset and ultimately greatly enhances the quality of our analyses and products.

If you want to talk soils or anything else, shoot me an email, chad@ag-informatics.com