Science Stories: Adventures in Bay-Delta Data

rss
  • January 24, 2025

What lives in the mud?

(spoiler alert, not just clams)

By Rosemary Hartman, with advice from Betsy Wells.

Benthic samples (things that live in mud)

The Environmental Monitoring Program has been collecting data on water quality, nutrients, zooplankton, phytoplankton, and benthic invertebrates for almost 50 years. Data from the benthic invertebrate sampling program has been key to documenting the invasion of the clam Potamocorbula amurensis and corresponding decrease in phytoplankton (Carlton et al. 1990; Kimmerer and Thompson 2014). However, the program catches a lot more than just clams. They bring up crustaceans, worms, amphipods, isopods, and lot of other critters you have probably never heard of. All of their data are published regularly on the Environmental Data Initiative website (Wells 2024), and there is a lot to be learned by looking through it.

What does sampling look like?

It’s not easy to look at what lives in mud that is 20 feet under water. EMP’s intrepid crew uses a ponar grab – a pair of metal “jaws” that can be held open until it hits a solid surface (like the river bottom). Then the weighted jaws snap shut, picking up a healthy helping of mud and associated critters (Figure 1). The survey crew then dumps the mud out into a mesh tray and slowly washes the mud away, leaving the critters.

Animated diagram showing how a ponar drops to the bottom of the ocean and clamps shut in the mud.
Figure 1. A gif demonstrating how a ponar grab works. A pair of metal "jaws" is lowered to the bottom of the water where it springs shut, scooping up a sample of mud and associated invertebrates.

What do they catch?

Well, when we look over the entire time period (1975-2023), 85% of the catch is made up of about 15 taxa (Figure 2, Figure 3). The most common is the invasive overbite clam, Potamocorbula amurensis. Second most common is a tube-dwelling amphipod, Americorophium stimpsoni. Next up is another amphipod, Amplesca abdita, followed by the polychaete worm Manayunkia speciosa. The rest of the “usual suspects” include some more polychaetes, several oligochaete worms, a few more amphipods, the Asian clam Corbicula flumninea, ostracods (also known as “seed shrimp”), and cumaceans (also called “comma shrimp).

Interestingly, there are also 41 species that have only ever been recorded once in the history of the program (Figure 4, Figure 5)! These include several crabs which are probably too fast to show up more frequently (Yellow rock crab – Metacarcinus anthonyi, blue-handed hermit crab – Pagurus samuelis, knobknee crestleg crab- Lophopanopeus leucomanus, and pea crab – Pinnixa scamit), the sea spider – Ammothea hilgendorfi, eleven different species of midge larvae (family Chironomidae), a dragonfly nymph (the blue dasher – Pachydiplax longipennis), and a few more worms and amphipods.

Pie chart showing the top 15 taxa caught by EMP's benthic survey.
Figure 2. Percent of total catch over the entire history of the EMP program (1975-2023) made up by the 15 most common taxa. (Click to enlarge)

The head of M. speciosa with lots of tentacles, Limnodrillus hofmeiseri, that looks like an earthworm, Potamocorbula amurensis, a small, white clam. N. hinumensis that looks like a shrimp with a fat head, C. fluminea, a dark, round clam, and A. spinicorne, that looks a bit like a shrimp.
Figure 3. Some of the most common taxa collected by EMP's benthic survey. Clockwise from top life: Manayunkia speciosa (a polychaete worm), Limnodrillus hoffmeisteri (an oligochaete worm), Potamorbula amurensis (overbite clam), Nippoleucon hinumensis (a cumacean – comma shrimp), Corbicula fluminea (Asian Clam), and Americorophium spinicorne (Amphipod). All images from DWR's Environmental Monitoring program, used with permission.

Timeline showing occurrences of rare taxa that were only found once in the history of the program, all of which occurred between 1996 and 2024. Insects were most common, followed by worms.
Figure 4. A timeline of instances when a species was found once in the EMP program, and never again. (Click to enlarge)

Photographs of a large yellow crab, a crustacean that looks like a spider, and two worm-like midge larvae.
Figure 5. A few taxa from the Delta that have only been seen once! The yellow rock crab, Metacarcinus anthonyi, the sea spider (Ammothea hilgendorfi) and midge larvae (family Chironomidae, several species). Yellow rock crab picture from Harmonic at English Wikipedia, (used under license CC BY-SA 3.0). Sea Spider picture from The Trustees of the Natural History Museum, London (used under license CC BY). Midge larvae image from CDFW's Stockton lab.

Who is Manayunkia speciosa anyway?

One of the top players in our benthic team is the polychaete worm, Manayunkia speciosa (first picture in Figure 3). If you’re not familiar with polychaetes, they are in the same phylum as earthworms (the annelids) but a different class (Polychaeta, not Oligochaeta). You can tell the difference because the oligochaetes are very “worm shaped” without a clear head and with only a few hairs. Polychaetes, on the other hand, have a lot of spines and hairs all over them. They sometimes have leg-like fins that ungulate along their sides, and they always have a distinct head. In the case of M. speciosa, he is a tube-dwelling worm, which means he sticks a bunch of sand and mud into a little house in the bottom of the river and lets his long, wavey feelers stick out, catching bits of food as they wave by. Most types of polychaetes are salt-water critters, but M. speciosa prefers freshwater, so he is found primarily in the freshwater stations sampled by EMP (Figure 6). M. speciosa is particularly important to the broader ecology of the Delta because they can carry the nasty salmon disease Ceratonova shasta, a myxozoan parasite (Foott 2017; Stocking et al. 2006).

Map of EMP's sampling stations with sizes based on average catch of M. speciosa. Catch is much higher in the eastern, freshwater regions.
Figure 6. Average catch per meter squared (log-transformed) of M. speciosa at all of EMP’s freshwater stations since 2000. (Click to enlarge)

One of the curious things about M. speciosa is that he can be very common, but not in every year. Looking at the average catch per m2 from all the freshwater stations, it can vary from a low of 7 individuals in 1978, to a high of 4,387 individuals per square meter in 1991 (Figure 7)! But why do we see these big swings? A lot of critters in the Delta have population swings based on how much rain we get, so we see patterns based on water year type (broad categories of precipitation from critically dry to wet, indicated by colored point shapes on Figure 7). We see that a lot of the really high population spikes in M. speciosa are during critically dry years. Other researchers have found that M. speciosa seems to do better in slow-moving water (Alexander et al. 2014), so maybe they get flushed out during high-flow years? But other high population years are categorized as “wet” or “above normal”, so that can’t be the only factor. An experiment by Malakauskas et al. (2013) found that while they can get dislodged at high flows, they have high survivorship after being dislodged, so high flow events might just spread them around.

The highest abundance of M. speciosa occurs in the late winter and spring (Figure 8) – the periods of highest flow in the Delta. This is a little different than the pattern of abundance in the Great Lakes – one of the few other places they’ve been studied – where the peak abundance was in May-August (Schloesser et al. 2016). A study of lab-reared M. speciosa found they have an annual life cycle and can reproduce throughout the year, but had highest egg production in the spring and summer, with babies staying in their mother’s tube for 4-6 weeks before emerging (Willson et al. 2010).

Line graph showing mean annual M. speciosa abundance over time. Abundanced peaked in 1976-76, 1991-1995, and 2005.
Figure 7. Average CPUE of M. speciosa in all the freshwater stations sampled by EMP from 1975-2023. (Click to enlarge)

Line graph showing average M. speciosa CPUE by month. There is much higher abundance in spring than summer.
Figure 8. Mean CPUE of M. speciosa by month for all the freshwater stations sampled by EMP, 2000-2023. (Click to enlarge)

M. speciosa seems to prefer fresh water, and California has a lot of fresh water outside of the Delta. Where else is it found? The Surface Water Ambient Monitoring Program (SWAMP) conducts benthic invertebrate surveys all over the state – sponsored by the State Water Board and implemented by CDFW. It turns out that in over 34,000 samples collected by SWAMP since the year 2000, M. speciosa has only been found 118 times, and most of those detections were in the Delta (Figure 7). However, research conducted on the Klamath River in northern California has found a lot of M. speciosa on that river, particularly in the slower reaches downstream of a major dam (Alexander et al. 2014; Stocking and Bartholomew 2007), so the lack of detections may be more “not knowing what to look for” than not being there. M. speciosa is also quite small, and may be too small to be caught in SWAMP’s sampling gear on a regular basis.

Map of all SWAMP Sampling sites distributed across California. Sites where M. speciosa has been found are highlighted. Most are near the Delta with only a few in other places.
Figure 9. Samples collected by the Surface Water Ambient Monitoring Program from 2000-2023 showing catch of polychaetes (including M. speciosa). Grey points indicated samples without polychaetes, colored circles indicating samples with polychaetes, with larger circles having more individuals. (Click to enlarge)

I wish I could end this blog post with a clear graph of something that is driving abundance of M. speciosa, but after two days of playing with the data, I haven’t found anything useful. So I will leave you with links to the data and so you can figure it out for yourself! Let me know if you have any ideas.

Check out EMP's website for more annual reports and more background information!

References and further reading:

Categories: BlogDataScience, Underappreciated data
  • July 5, 2022

Authors: Rosemary Hartman and the IEP Data Utilization Work Group

Here at IEP, we collect a lot of data, and we do a lot of science. However, people haven’t always realized how much data we collect because it hasn’t always been easy to find. For scientists that were able to find the data, sometimes it was difficult to understand or it was shared in a hard-to-use format. That’s why IEP’s Data Utilization Work Group (DUWG) has been pushing for more Open Science practices over the past five years to make our data more F.A.I.R (Findable, Accessible, Interoperable, and Reusable). And wow! We’ve come a long way in a short time.

A staircase with FAIR Principles written on it and stick figures climbing it. Circles are around the staircase.  One shows a map pin that says Persistent and Findable. One shows an open lock that says 'Accessible' with meaningful interaction. One shows a person and a puzzle and says 'Reusable with Full Disclosure', and one shows two computers with a line between them and says 'Interoperable'.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

What is Open Science anyway? Well, I was going to call it “the cool-kids club” but really it’s the opposite of a club! It’s the anti-club that makes sure everyone has access to science – no membership required. Open science means that all scientists communicate in a transparent, reproducible way, with open-access publications, freely shared data, open-source software, and openness to diversity of knowledge. Open science encourages collaboration and breaks down silos between researchers – so it’s a natural fit for a 9-member collaborative organization like IEP.

For IEP, the ‘open data’ component is where we’ve really been making strides. While “share your data freely” sounds easy, it’s actually taken a lot of work to make our data FAIR. As government entities, theoretically all of the data we collect is held in the public trust, but putting data in a format that other people can use is not simple. Here are some of the things we have done to make IEP data more open:

Data Management Plans

The first thing the DUWG did was get all IEP projects to fill out a simple, 2-page data management plan outlining what was being done with the data in short, clear sections:

  • Who: Principal investigator and point of contact for the data.
  • What: Description of data to be collected and any related data that will be incorporated into the analysis.
  • Metadata: How the metadata will be generated and where it will be made available.
  • Format: What format the data will be stored in and what format it will be shared in, which may not be the same. For example, you may store data in an Access database but share it in non-proprietary .csv formats.
  • Storage and Backup: Where you will put the data as you are collecting it and how it will be backed-up for easy recovery. This is about short-term storage.
  • Archiving and Preservation: This is about long-term storage to keep your data for someone years down the line. This is best done with publication on a data archive platform, such as the Environmental Data Initiative (EDI).
  • Quality Assurance: Brief description of Quality Assurance and Quality Control (QAQC) procedures and where a data user can access full QAQC documentation.
  • Access and Sharing: How can users find your data? Is it posted on line or by request? Are there any restrictions on how the data can be used or shared? 

You can find instructions (PDF) and a template (PDF) for Data Management Plans on the DUWG page. All of IEP’s data management plans are also posted on the IEP website.

Data Publication

Many IEP agencies were already sharing data on agency websites, but most of this was done without formal version control, machine-readable metadata, or digital object identifiers (DOIs), making it difficult to track how data were being used. Now IEP is recommending publishing data on EDI or other data archives. Datasets now have robust metadata, open-source data formats (like .csv tables instead of Microsoft Access databases), and DOIs for each version so studies using these data can be reproduced easily.

Cartoons of stick people illustrating the phases of the data life cycle with arrows connecting them. Data collection - People with nets catching shapes.  Data processing - people take shapes out of a box labeled short-term storage and lay them out on a table. Data Study and Analysis - people make patterns with the shapes. Data publishing and access - People present the data to an audience. Data Preservation - People put shapes in tubes and boxes. Data re-use - people open tubes and a string of shapes come out. Research ideas - Shapes inside a light bulb.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

Metadata Standards

The term “metadata” can mean different things to different people. Some people may think it simply means the definitions of all the columns in a data set. Some people may think it means a history of changes to the sampling program. Some people think it’s your standard operating procedures. Some people may think it means data about social media networks. What is it? Well, it’s the “who, what, where, when, why, and how” of your data set. It should include everything a data user needs to understand your data as well as you do. The DUWG developed a template for metadata that includes everything we think you should include in full documentation for a dataset. Some of it might not apply to every dataset, but it is a good checklist to get you started.

You can find the Metadata template (PDF) on the DUWG page.

QAQC standards

The DUWG is just starting to dig into QAQC. Quality assurance is an integrated system of management activities to prevent errors in your data, while quality control is a system of technical activities to find errors in your data. QAQC systems have become standard practice in analytical labs, but the formalization and standardization of QAQC practices is new for a lot of the fish-and-bug-counters at IEP. The DUWG QAQC sub-team developed a template for Standard Operating Procedures (PDF), and is working to provide guidance for QAQC of all types of data, and for integrating QAQC into all sampling programs. This promotes consistency across time, people, and space, increases transparency, and gives users more confidence in your data.

Dataset integration

One of the great things about laying down the framework for open data that includes data publication, documentation, and quality control is that it then becomes much easier to integrate datasets across programs. The IEP synthesis team (spearheaded by Sam Bashevkin of the Delta Science Program) has developed several integrated datasets that pull publicly accessible data, put them in a standard format, and publish them in a single, easy-to-use format.

Spreading the Word

We’re also making sure EVERYONE knows about how great our data are.

  • We’ve revamped the data access webpage on our IEP site.
  • Publishing data on EDI makes it available on DataOne, which allows searches across multiple platforms.
  • Publishing data papers is a relatively new way to let people know about a dataset. For example, this zooplankton data paper was recently published in PLOSOne.
  • We’ve made presentations at the Water Data Science Symposium and other scientific meetings.
  • We published an Open Data Framework Essay in San Francisco Estuary and Watershed Sciences.
  • We also put on a Data Management Showcase (video) that you can watch via the Department of Water Resources YouTube Channel.
  • Plus, we have lots more data management resources available on the DUWG website.

Together, we're putting IEP Data on the Open Science Train to global recognition. 

Questions? Feel free to reach out to the DUWG co-chairs: Rosemary Hartman and Dave Bosworth. If you have any suggestions for improving data management or sharing, we want to hear about it.

Two birds are in a fountain labeled Fountain of Open Data. One asks: You mind if I reuse this data? The other says: Go ahead! we can even work together on it.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

Further Reading

Categories: BlogDataScience
  • May 17, 2021

One of life’s greatest joys is playing with data. However, not everyone has the time or experience needed to make fancy graphs. Fortunately, availability of on-line web applications that allow people with no data analysis experience to visualize status and trends of data across space and time has exploded in recent years.

Three fish look at a graph. One says 'I want to make graphs, but I can't type with fins'. Another says 'I don't even know where to get the data!'. The third says 'Don't worry, there are lots of apps that you can use to graph the data automatically.'
Figure 1. Fish love data, but they need a little help making their graphs.

One of the first data visualization tools was the mapping widgets on the CDFW website. These maps allow you to plot the catch for different fish species as different size bubbles, and have been available since the late 1990s:

But we needed better ways to display data from multiple surveys at once at the click of a button. The website Bay Delta Live was launched in 2007 as a home for Bay-Delta data and data visualizations. It includes summaries, graphs, and interactive visualizations for water quality, operations, fish monitoring, and special studies.

A similar website, SacPas, was built specifically for synthesizing, summarizing, and displaying data for salmonids in the Central Valley. It allows a user to visualize data on salmon abundance, temperature thresholds, river conditions, and hydrologic conditions. It also lets you play with a nifty Chinook Salmon population model and download all the underlying data.

Three fish look at a map. The tule perch says 'This app lets you see how much flow you need to get different amounts of salmon habitat.' The splittail says 'This is great! Where is ths splittail habitat app?'
Figure 2. FLowWest's Central Valley Instream Rearing Habitat Calculator shiny app

Custom-built websites like Bay Delta Live and SacPas are great, but they are built by web developers, not fisheries scientists. Now, thanks to user-friendly data display tools such as Tableau and the increase in coding literacy among environmental scientists, more and more people can create their own on-line data visualizations. This means the number of data visualizations apps has grown astronomically in the past few years, and many apps are custom-built for specific scientific questions.

The Delta Science Program now hosts a number of these visualizations built with the R package “shiny’

Three fish look at a map. The tule perch says 'This app lets you make maps of all the IEP fish sampling stations.' The striped bass says 'Oh, good, now I know all the places I should avoid.'
Figure 3. You can now map all the stations monitored by IEP's long-term surveys.

Other Shiny apps have launched recently on a variety of other platforms:

Three fish look at a graph of salmon survival. The Tule Perch says 'you can use the STARS model to look at survival probabilities'. The splittail says 'I'm glad I don't have to migrate through the Delta.'
Figure 4. CalFishTrack includes a Shiny App of their Survival Travel time And Routing Simulation (STARS).

USGS has developed several new dashboards for mapping water and water quality data:

With all these tools out there, it’s one big data playground! If you’re interested in making your own, it’s easy to get started with Shiny. Visit the Learn Shiny video tutorial!

Categories: General