WMC / Bug Lab Database Queries
The Western Center for Monitoring and Assessment of Freshwater Ecosystems (WMC) and the National Aquatic Monitoring Center (NAMC) jointly maintain a database of biological and environmental data collected from streams and other aquatic ecosystems throughout the western United States and elsewhere. At this time the vast majority of data are for macroinvertebrates. In the future data for fish and algae may be available. Users interested in downloading data may use either of two tools to download data: MAPIT*, the Mapping Application for Freshwater Invertebrate Taxa and the Sample Query Tool**. MAPIT allows users to download occurrence data for selected taxa and produce a Google map showing the distribution of those records. The Sample Query Tool allows users to select sets of multi-taxon samples and download both biological and associated environmental data in either comma (*.csv) or tab (*.txt) delimited format.
The Database – Our database contains over 1.3 million records of freshwater invertebrate taxa collected from freshwater ecosystems in the United States. These records come from over 44,000 samples collected over a 40 plus year period by a variety of sources, and new records are continually being added. Most of the records were derived from multi-taxon samples collected from western USA streams and rivers by USDI Bureau of Land Management (BLM), USDA Forest Service (USFS) personnel, or Utah State University researchers, but national-scale data associated with EPA projects (EMAP and National Aquatic Resource Surveys) and USGS NAWQA surveys (reference sites only) are also included. Most BLM, USFS, and USU samples were processed (i.e., taxonomic identifications) by the NAMC’s identification lab (aka the USU/BLM Bug Lab). Note that genus-level identifications are most common for insect taxa, and family-level identifications are most common for non-insect taxa. Records include data from both fixed-area and qualitative samples.
Data Accuracy – We have tried to maintain high data standards in terms of taxonomic accuracy and site locations. However, we cannot guarantee the accuracy of all specimen identifications, and we know or suspect that the identifications of some individuals in some samples are wrong. We are continuously working to correct obvious problems. We have also not attempted to reconcile changes in taxonomy that may have occurred across the 40 plus year period of record covered by the database. We also have not verified the site locations (latitude/longitude) for all records. Some geographic coordinates may be imprecise and a few are known to be wrong. In these latter cases we are working with the original data source to obtain correct geographic coordinates. If you notice any obvious errors in the data you download, please email either Chuck Hawkins (email@example.com) or Scott Miller (firstname.lastname@example.org).
MAPIT – This query tool allows you to query specific taxonomic entities of interest and see where they have been collected. However, queries can take a significant time to run because of the size of the database. To minimize query times, we recommend not searching for records of higher taxonomic groups (e.g., order and above). Refining queries by project (data source), state, or year will also reduce query times. Queries can be further refined by sampling method and the type of habitat that was sampled. Even when queries are narrowly refined, the number of records can be so large that the time it takes to create location maps can be excessive. We therefore designed the mapping function and the data window to use no more than 2,000 records at a time, which should take less than 2 minutes to map. By using the Results Page function, you can incrementally see different batches of up to 2,000 records. Note, however, that all records (not just those in the data window and map) will be saved to a *.csv file when you download data.
Start your query by typing in the taxon name of interest in the query window. As you type, MAPIT will show a list of taxa names (and other words) in the database that contain the sequence of letter (and other symbols) you have typed in. Either continue typing or select one of the entries that appear below the query window. If you do not want to filter the query in any way, just click ‘search’ and your query will return all records for the selected taxon. If you want to filter your query, you have two options. You can run a series of queries that increasingly narrows your search by sequentially clicking on categories of records in the filter menu on the left side of the page (e.g., subcategories under ProjectName, State, SampleDate, etc.), or you can type in a combination of filter terms into the query window. For example, if you want the records for the elmid beetle, Cleptelmis from only California and only the year 2009, type “Cleptelmis CA 2009” in the query window and then click search. Note that taxa names are not case sensitive but filter terms are. Type filter terms exactly as they appear in the filter menu, e.g., “cleptelmis CA 2009” works but “Cleptelmis ca 2009” will not return any records. This simultaneous filtering approach is much faster than the sequential approach. The numbers in parentheses to the right of each filter category are the numbers of records in each category. Also note that if you type the name of a site or station that is contained in the database, the query will return a list of all taxa collected from that site. For example, searching for “Yuba” returns records for several sites in the Yuba River basin of California and one site from Yuba Creek in Idaho. Typing “Yuba Ephemerellidae” returns records of Ephemerellidae taxa for sites or stations with Yuba in their names.
Mapping Options – You have two choices of map displays. The ‘Pins’ option places a pin icon at each location in your query. You can click on a pin to see site information. This option is slower than the ‘Site Cluster’ option, which creates a heat map of clusters of sites. By clicking on individual clusters, you can zoom in to see increasingly resolved clusters of sites.
The Sample Query Tool – To use the Sample Query Tool, select the primary type of data of interest and click ‘Get Data’. From the data filter screen, select project, state and year of interest; whether you want data just from reference-quality sites; and the level of taxonomic aggregation (OTU level) Biological data can be downloaded with taxa identifications in raw form (i.e., the taxonomic level as originally reported) or data can be rolled up to one of 3 standardized taxonomies based on different operational taxonomic unit (OTU) specifications. For OTU designations 1 - 3, all individuals are assigned to an unambiguous taxonomic level or dropped from the data. OTU-0 = raw data, OTU-1 = highest resolution possible for each major taxonomic grouping, OTU-2 = same as OTU-1 except chironomid midges are rolled up to subfamily, OTU-3 = all taxa rolled up to family except that chironomid midges are rolled up to subfamily and a few groups (e.g., Acari) are rolled up to higher taxa. Also select whether you want to resample the data to a fixed count (this option may greatly increase query times). Finally, select what sample information you want associated with the data (project name, station, etc.). To minimize file size, queries will not contain the taxonomic hierarchy associated with different taxa. Users can download both the Taxonomy Table that links taxa to their full taxonomic hierarchy and the Taxa-OTU Translation Table that maps original laboratory identifications to the different OTU schemes. After the query runs, you can extract additional data associated with the samples you queried if data exist (e.g., GIS and habitat data). You can also save your query as either a *.csv or *.txt file.
Associated environmental data can also be downloaded. At this time, these data consist of site information including whether the site is a reference site or not; climate, geology, topography, and soils data extracted within a geographic information system; and some local habitat data. Users need to be aware that the environmental data are incomplete.
Please Acknowledge Us – You are welcome to freely use data from the WMC/NAMC database, but we ask that you acknowledge the data’s sources and our web services in your presentations, reports, or publications. Your use and acknowledgement of the database ultimately generates the support that allows us to maintain this resource. We welcome opportunities to collaborate on data analyses. In such cases, we can often provide additional supporting data that we cannot provide via our query tools.
*MAPIT was developed by Sirisha Pratha with support from the WMC and NAMC.
**The Sample Query Tool was developed by Sanket Korgaonkar and Padma Prathipati with support from the USGS NAWQA program.