Science Stories: Adventures in Bay-Delta Data

rss
  • August 29, 2022
any small crabs running around on a tray

More underappreciated data!

This is the second blog in our series on underutilized datasets from IEP.

San Francisco Bay Study’s Crab Catch dataset

Curated by Kathy Hieb and Jillian Burns

The San Francisco Bay Study has been sampling with otter trawls and midwater trawls throughout the San Francisco Bay, Suisun Bay, and Delta since 1980. Their fish data have been used in a number of scientific studies, regulatory decisions, and journal articles. However, did you know they measure and count crabs in their nets too?

Bay Study’s stations are all categorized as “Shoal” (shallow areas) or “Channel” (deeper samples). Crabs are collected by otter trawl, which is towed along the bottom of the water, scraping up whatever demersal fishes and invertebrates it comes across. Truth be told, it’s not the best way to catch crabs, because most crabs like hiding under rocks where they are out of the way of the net, but it does give us a metric of status and trends of some of the most common species of crabs, including the Pacific rock crab (Cancer productus), the graceful rock crab (Cancer gracilis, also known as the slender rock crab), the red rock crab, and everyone’s favorite, the Dungeness crab (Metacarcinus magister).

After the net has been towed on the bottom for five minutes, it’s brought on board the boat and the biologists count, measure, and sex the crabs they’ve caught (Figure 1). This can be tricky, because crabs can be FAST! Especially the smaller Dungeness crabs (Figure 2). The biologists have to be careful and pick up the crabs by their back side to avoid getting pinched by their claws, which definitely takes practice.

a large crab is held by the back of its shell and is being measured with calipers
Figure 1. Each crab is carefully measured using calipers. This is where experienced biologists have to practice holding the crabs carefully to avoid being pinched. Image credit, Lynn Takata, Delta Science Program.
tray full of several dozen small crabs
Figure 2. Lots of little crabs! Juvenile crabs can be particularly hard to catch, and particularly hard to tell apart. Image credit: Kathy Hieb, CDFW.

Once all the crabs are counted and measured, they are entered into a database that goes back to 1980. Bay's Study's Dungeness crab data have been used to help manage the commercial crab fishery because fisheries-independent data is valuable. From 1975 to 1978, an estimated 38-82% of the Dungeness crabs in the central California region rear in the San Francisco Estuary each year (Wilde and Tasto 1983). This dataset was also very helpful in tracking the introduction, expansion, and decline of the Chinese mitten crab (Eriocheir sinensis), which briefly took over the brackish regions of the estuary but declined as rapidly as it arrived (Figure 3. Rudnick et al 2003). Bay Study's crab data has also been combined with other datasets to see how the estuarine community as a whole responds to climate patterns and human impacts (Cloern et al. 2010).

line graph showing annual average catch per trawl of five species of crabs caught by Bay Study in each region of the Estuary (South Bay, Central Bay, Suisun, and the West Delta) - click to enlarge in new window
Figure 3. Annual mean catch per trawl of the most common species of crabs across each region of the estuary. Dungeness crabs are the most frequently caught, with peaks in South Bay, Central Bay, and San Pablo in 2013 and 2016. Chinese mitten crabs had a spike in abundance in Suisun and the West Delta around 2002, but are rarely caught before or after. The red rock crab, graceful rock crab, and Pacific rock crab are only caught in South Bay, San Pablo, and Central Bay, and then only in low abundances. Click image to enlarge.

However, a lot of questions remain to be asked of this dataset. Why did we see such high catch of Dungeness crabs in 2013 and 2016? What are the drivers between the lesser-studied crabs, such as the graceful rock crab? How does the salinity preference of each species of crab differ (Figure 4)? If you want to investigate these questions yourself, data are available on the CDFW file library website. But be careful, the data have a few hiccups in them, such as changes to sampling sites over time, missing samples during period of boat break-downs, and other caveats. Be sure to read the metadata and make sure you understand the data before using them.

dot plot showing the salinity at which each species of crab is caught - click to enlarge in new window
Figure 4. Dot plot showing the salinity of each trawl where each species was found from 1995-2005. The Pacific rock crab, graceful rock crab, and red rock crab mostly occur at high salinity (25-32 PSU), but the Dungeness crab is often found in brackish water (10-32 PSU), and the Chinese Mitten crab was found in fresh to brackish water and mostly absent from high salinity water (anything greater than 28 PSU). Click image to enlarge.

Further reading

Categories: BlogDataScience, Underappreciated data
  • July 5, 2022

Authors: Rosemary Hartman and the IEP Data Utilization Work Group

Here at IEP, we collect a lot of data, and we do a lot of science. However, people haven’t always realized how much data we collect because it hasn’t always been easy to find. For scientists that were able to find the data, sometimes it was difficult to understand or it was shared in a hard-to-use format. That’s why IEP’s Data Utilization Work Group (DUWG) has been pushing for more Open Science practices over the past five years to make our data more F.A.I.R (Findable, Accessible, Interoperable, and Reusable). And wow! We’ve come a long way in a short time.

A staircase with FAIR Principles written on it and stick figures climbing it. Circles are around the staircase.  One shows a map pin that says Persistent and Findable. One shows an open lock that says 'Accessible' with meaningful interaction. One shows a person and a puzzle and says 'Reusable with Full Disclosure', and one shows two computers with a line between them and says 'Interoperable'.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

What is Open Science anyway? Well, I was going to call it “the cool-kids club” but really it’s the opposite of a club! It’s the anti-club that makes sure everyone has access to science – no membership required. Open science means that all scientists communicate in a transparent, reproducible way, with open-access publications, freely shared data, open-source software, and openness to diversity of knowledge. Open science encourages collaboration and breaks down silos between researchers – so it’s a natural fit for a 9-member collaborative organization like IEP.

For IEP, the ‘open data’ component is where we’ve really been making strides. While “share your data freely” sounds easy, it’s actually taken a lot of work to make our data FAIR. As government entities, theoretically all of the data we collect is held in the public trust, but putting data in a format that other people can use is not simple. Here are some of the things we have done to make IEP data more open:

Data Management Plans

The first thing the DUWG did was get all IEP projects to fill out a simple, 2-page data management plan outlining what was being done with the data in short, clear sections:

  • Who: Principal investigator and point of contact for the data.
  • What: Description of data to be collected and any related data that will be incorporated into the analysis.
  • Metadata: How the metadata will be generated and where it will be made available.
  • Format: What format the data will be stored in and what format it will be shared in, which may not be the same. For example, you may store data in an Access database but share it in non-proprietary .csv formats.
  • Storage and Backup: Where you will put the data as you are collecting it and how it will be backed-up for easy recovery. This is about short-term storage.
  • Archiving and Preservation: This is about long-term storage to keep your data for someone years down the line. This is best done with publication on a data archive platform, such as the Environmental Data Initiative (EDI).
  • Quality Assurance: Brief description of Quality Assurance and Quality Control (QAQC) procedures and where a data user can access full QAQC documentation.
  • Access and Sharing: How can users find your data? Is it posted on line or by request? Are there any restrictions on how the data can be used or shared? 

You can find instructions (PDF) and a template (PDF) for Data Management Plans on the DUWG page. All of IEP’s data management plans are also posted on the IEP website.

Data Publication

Many IEP agencies were already sharing data on agency websites, but most of this was done without formal version control, machine-readable metadata, or digital object identifiers (DOIs), making it difficult to track how data were being used. Now IEP is recommending publishing data on EDI or other data archives. Datasets now have robust metadata, open-source data formats (like .csv tables instead of Microsoft Access databases), and DOIs for each version so studies using these data can be reproduced easily.

Cartoons of stick people illustrating the phases of the data life cycle with arrows connecting them. Data collection - People with nets catching shapes.  Data processing - people take shapes out of a box labeled short-term storage and lay them out on a table. Data Study and Analysis - people make patterns with the shapes. Data publishing and access - People present the data to an audience. Data Preservation - People put shapes in tubes and boxes. Data re-use - people open tubes and a string of shapes come out. Research ideas - Shapes inside a light bulb.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

Metadata Standards

The term “metadata” can mean different things to different people. Some people may think it simply means the definitions of all the columns in a data set. Some people may think it means a history of changes to the sampling program. Some people think it’s your standard operating procedures. Some people may think it means data about social media networks. What is it? Well, it’s the “who, what, where, when, why, and how” of your data set. It should include everything a data user needs to understand your data as well as you do. The DUWG developed a template for metadata that includes everything we think you should include in full documentation for a dataset. Some of it might not apply to every dataset, but it is a good checklist to get you started.

You can find the Metadata template (PDF) on the DUWG page.

QAQC standards

The DUWG is just starting to dig into QAQC. Quality assurance is an integrated system of management activities to prevent errors in your data, while quality control is a system of technical activities to find errors in your data. QAQC systems have become standard practice in analytical labs, but the formalization and standardization of QAQC practices is new for a lot of the fish-and-bug-counters at IEP. The DUWG QAQC sub-team developed a template for Standard Operating Procedures (PDF), and is working to provide guidance for QAQC of all types of data, and for integrating QAQC into all sampling programs. This promotes consistency across time, people, and space, increases transparency, and gives users more confidence in your data.

Dataset integration

One of the great things about laying down the framework for open data that includes data publication, documentation, and quality control is that it then becomes much easier to integrate datasets across programs. The IEP synthesis team (spearheaded by Sam Bashevkin of the Delta Science Program) has developed several integrated datasets that pull publicly accessible data, put them in a standard format, and publish them in a single, easy-to-use format.

Spreading the Word

We’re also making sure EVERYONE knows about how great our data are.

  • We’ve revamped the data access webpage on our IEP site.
  • Publishing data on EDI makes it available on DataOne, which allows searches across multiple platforms.
  • Publishing data papers is a relatively new way to let people know about a dataset. For example, this zooplankton data paper was recently published in PLOSOne.
  • We’ve made presentations at the Water Data Science Symposium and other scientific meetings.
  • We published an Open Data Framework Essay in San Francisco Estuary and Watershed Sciences.
  • We also put on a Data Management Showcase (video) that you can watch via the Department of Water Resources YouTube Channel.
  • Plus, we have lots more data management resources available on the DUWG website.

Together, we're putting IEP Data on the Open Science Train to global recognition. 

Questions? Feel free to reach out to the DUWG co-chairs: Rosemary Hartman and Dave Bosworth. If you have any suggestions for improving data management or sharing, we want to hear about it.

Two birds are in a fountain labeled Fountain of Open Data. One asks: You mind if I reuse this data? The other says: Go ahead! we can even work together on it.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

Further Reading

Categories: BlogDataScience
  • March 1, 2022

We all know climate change is going to be rough. We expect increases in temperature, changes in rainfall (where, when, and how much), and local extinctions or migration of plants and wildlife as the climate shifts. Climate change can sound abstract and is often spoken of as a phenomenon of the future, despite the changes we are already seeing in our surroundings. These changes affect the San Francisco Estuary and will eventually make it necessary to adjust the way we manage our water in California if we want to lessen the impact on those ecosystems. To better understand the impacts of climate change and to better inform management strategies, a group of Interagency Ecological Program (IEP) scientists wanted to find out how much is known about climate change in the Sacramento-San Joaquin Delta, Suisun Bay and Suisun Marsh and how management actions can lessen these effects. To do this, they gathered scientists with broad expertise – from zooplankton to aquatic vegetation – and created the Climate Change Project Work Team.

The team decided to start by creating a conceptual model (similar to the Baylands Goals model created for the San Francisco Bay) and synthesize already published research in a technical report. A conceptual model is an organized way of thinking through a particular problem, system, or idea in a visual way to make it easier to see and understand connections. Conceptual models are especially helpful when working in groups as while it is developed everyone participates and has to think through the problem and understands why the model looks like it does when it’s done. The climate change conceptual model made by the group let them see how the Estuary responds to different environmental drivers and that in turn showed what subjects to read about to find the answers they were looking for. The Climate Change conceptual model (Figure 1) started with global-scale changes in the top box, which impact landscape-scale environmental conditions in the Estuary. Those landscape-scale conditions influence site-level environmental change. For example, increases in global air temperature cause increases in water temperature in the rivers and bays, which in turn impact the temperatures experienced by each critter in the rivers. These climate-change effects also interact with landscape management (such as levee construction or wetland restoration) to impact the aquatic environment at a site.

Landscape impacts from climate change (for example, sea level rise, temperatures, and salinity field) impact local scale factors within an ecosystem.

Figure 1. The Climate Change Project Work Team's conceptual model.

Putting together the conceptual model and writing a synthesis of what we know so far is useful in other ways as well. It allowed the team to find out where there are things we need to study more if we want to be able to give better answers about what will happen in our aquatic ecosystems. The model highlighted three aquatic ecosystems in the estuary where organisms will experience different effects from climate change. The largest ecosystem in the Estuary today is open water. Marshes and floodplains make up a much smaller proportion of the habitat, but are still highly important to native species. Three different teams of scientists went on to review literature on the different ecosystems, diving into the current status of fish, benthic invertebrates, plankton and aquatic vegetation, and trying to predict changes and risks.

So, what did the teams find?

Out of the three, the open water ecosystem will be most impacted by drought and warmer temperatures. The changes brought by this will make this ecosystem more suitable for many of the invasive fish, invertebrates and aquatic vegetation, though higher salinity conditions during droughts may also favor some native fishes and aquatic vegetation (Figure 2). Predictions of future Delta temperatures have found that Delta Smelt's spawning window may be greatly restricted, further stressing this endangered fish (Brown et al. 2016).

Diagram showing current status of open water ecosystems, including invasive fish, weeds, and clams.

Climate change effects on open water ecosystems includes increased temperatures, increased invasive fish, and increased harmful algal blooms.

Figure 2. Impacts of climate change in open water ecosystems include harmful algal blooms, increased invasive clams, increased aquatic weeds, and increased invasive fishes, such as largemouth bass and Mississippi silversides.

Floodplains will experience major changes in timing and magnitude of inundation. Precipitation will become more variable with more frequent extreme floods and droughts. The larger storms we have seen lately benefit floodplains and the native fish that use them to spawn and feed, but only if they occur at the right time. Floods will shift to earlier in the season as more precipitation falls as rain instead of snow, keeping migratory species from being able to use the floodplain when they need it. More frequent droughts will mean the floodplain may not be available at all for years at a time (Figure 3). Management actions that increase the frequency or duration of floodplain inundation, such as the Yolo Big Notch Project, may become more important if floodplains are to be sustainable in the future.

Diagram showing current status of floodplains in the Delta. Most floodplain habitat is restricted to the Yolo Bypass and Cosumnes, but is important spawning and rearing habitat.

Aquatic fish and other aquatic life will have reduced use of the floodplains due to reduced frequency of inundation from extended periods of drought.

Figure 3. Floodplains, which are important habitat for spawning Sacramento Splittail and juvenile Chinook Salmon will not be inundated as frequently as droughts become more frequent, and may experience earlier flooding as more precipitation falls as rain instead of snow.

Tidal marshes are relatively scarce, but very important habitats. They provide food and nursery habitat for many fish and waterbird species. Whether they will continue to exist where they are will depend on the amount of sediment that will deposit in the marshes to keep up with sea level rise. Some models show that the larger storms will bring more sediment to the Delta which will help the marshes remain, but other models show that much of our tidal marsh will drown, especially if they do not have gentle, sloping transitions to uplands. Restoration planners may need to prioritize areas with adequate transition zones if they want restoration sites to be sustainable in the long-term.

Diagram showing current status of tidal wetlands in the Delta. Wetlands are relatively rare, but provide important rearing habitat with high food availability.

Tidal wetland size and functionality will be reduced due to sea level rise, increased temperatures, and invasive species.

Figure 4. Tidal marshes may drown as sea levels rise unless they have gentle transitions to upland areas. They may also experience the same increases to invasive species and increased temperature as open water ecosystems.

Other members of the Climate Change PWT have been working on looking for temperature trends from our monitoring record. They have found evidence for increased temperatures over the past 50 years (Bashevkin et. al., 2021), lower temperatures during wetter years (Bashevkin and Mahardja, 2022), differences in temperatures at the top and bottom of the water (Mahardja et. al., 2022), and hotter temperatures in the South Delta (Pien et. al., draft manuscript).

For a young adult audience interested to learn more about the San Francisco Estuary, the Sacramento-San Joaquin Delta in general and how climate change will affect it and the species living there check out a collection called Where the river meets the ocean – Stories from San Francisco Estuary . Many of the scientists that are on the team who wrote the Climate Change Technical Report also wrote for this collection, published by Frontiers for Young Minds.

Further Reading:

Bashevkin, S. M., and B. Mahardja. in press. Seasonally variable relationships between surface water temperature and inflow in the upper San Francisco Estuary. Limnology and Oceanography

Bashevkin, S. M., B. Mahardja, and L. R. Brown. 2021. Warming in the upper San Francisco Estuary: Patterns of water temperature change from 5 decades of data.

Brown, L. R., L. M. Komoroske, R. W. Wagner, T. Morgan-King, J. T. May, R. E. Connon, and N. A. Fangue. 2016. Coupled downscaled climate models and ecophysiological metrics forecast habitat compression for an endangered estuarine fish. Plos ONE 11(1):e0146724. 

Colombano, D. D., S. Y. Litvin, S. L. Ziegler, S. B. Alford, R. Baker, M. A. Barbeau, J. Cebrián, R. M. Connolly, C. A. Currin, L. A. Deegan, J. S. Lesser, C. W. Martin, A. E. McDonald, C. McLuckie, B. H. Morrison, J. W. Pahl, L. M. Risse, J. A. M. Smith, L. W. Staver, R. E. Turner, and N. J. Waltham. 2021. Climate Change Implications for Tidal Marshes and Food Web Linkages to Estuarine and Coastal Nekton. Estuaries and Coasts.

Dettinger, M., J. Anderson, M. Anderson, L. Brown, D. Cayan, and E. Maurer. 2016. Climate change and the Delta. San Francisco Estuary and Watershed Science 14(3).

Knowles, N., C. Cronkite-Ratcliff, D. W. Pierce, and D. R. Cayan. 2018. Responses of Unimpaired Flows, Storage, and Managed Flows to Scenarios of Climate Change in the San Francisco Bay-Delta Watershed. Water Resources Research 54(10):7631-7650. 2

Mann, M. E., and P. H. Gleick. 2015. Climate change and California drought in the 21st century. Proceedings of the National Academy of Sciences 112(13):3858-3859.

Categories: BlogDataScience, General
  • December 30, 2021

Lots of Interagency Ecological Program (IEP) scientists research fish. Of the 22 surveys in IEP's Research Fleet, 17 are primarily focused on fish. But fish in the San Francisco Estuary are hard to catch these days. Over the past thirty years, Delta Smelt, Longfin Smelt, and even the notoriously hardy Striped Bass have declined precipitously (CDFW FMWT data). To figure out how to reverse these declines, we need an understanding of the “bottom-up” processes that exert control on these populations—we need to study fish food. Therefore, we need to increase our understanding of what pelagic fish eat: zooplankton.

Magnifying glass with cartoon images of several zooplankters

If you’ve spent any time around fish people, you’ve probably heard the word “zooplankton”, but you might not really know what it means. Zooplankton are small animals that live in open water and cannot actively swim against the current (“plankton” means “floating” in Greek). They include crustaceans (copepods, water fleas, larval crabs, etc.), jellyfish, rotifers, and larval fish. Most of them are hard to see without a microscope, so they are easy to overlook – but you’d miss them if they weren’t there because most of your favorite fish rely on zooplankton for food.

Fortunately, the IEP Zooplankton Project Work Team has been tackling the problem head-on. The group got started when Louise Conrad and Rosemary Hartman were both collecting zooplankton samples near the same restoration site. They thought “We’d be able to say a lot more about the restoration site if we combined our data sets!” But with samples collected using different gear and identified by different taxonomists, it proved more difficult than they originally thought. They needed a team of experts to help them figure out how to deal with the differences in their data. So the Zooplankton Synthesis Team was born! The original team included Karen Kayfetz, Madison Thomas, April Hennessy, Christina Burdi, Sam Bashevkin, Trishelle Tempel, and Arthur Barros, but soon grew as more people heard about the discussions they were having.

The team started by identifying the major zooplankton datasets that IEP collects and dealing with tricky data integration questions:

  • Can you integrate data sets when the critters were collected with different mesh sizes?
  • What do you do when one data set identifies the organisms to genus and another one identifies down to species?
  • What if these levels of identification change over time?
  • Does preservation method impact the dataset?

diagram of three data sets being put into a machine and turning into one data set

To integrate data sets, the team standardized variable names, standardized taxon names, and summarized taxa based on their lowest common level of resolution.

While working through these sticky questions, they compiled what they learned about the individual zooplankton surveys into a technical report (PDF) describing each survey and how they are similar and different. They published a data package integrating five different surveys into a single dataset and Sam put together a fantastic web application that allows users to filter and download the data with a click of a button.

The team had put together the data, but there was more work to do. They realized they needed to do more if they wanted people to use their data. Lots of data on zooplankton get collected, but few research articles are published about zooplankton, and zooplankton data are rarely used to inform management decisions. To get the broader scientific community excited about zooplankton in the estuary, the ZoopSynth team worked with the Delta Science Program to host a Zooplankton Ecology Symposium with zooplankton researchers from across the estuary and across the country (you can watch the Symposium recording on YouTube.). From this symposium they learned a few important lessons to help increase communication and visibility of zooplankton data and research:

  • Managers and scientists should work together to develop clear goals and objectives for management actions. Is there a threshold of zooplankton biomass or abundance to achieve? Or is the goal simply higher biomass of certain taxa? This will make it easier to design a study that provides management-relevant results.
  • Scientists should understand the management goals and keep the end goal in mind. If the end goal is fish food, study taxa that are most common in fish diets. If the primary interest is contaminant effects, focus on sensitive species.
  • We need to start using new tools like automated imagery and DNA along with traditional microscopy to collect better data faster.
  • We need to maximize the accessibility of zooplankton data to scientists and managers. Scientists should share data in publicly available places in easy-to-read formats. Similarly, managers should share lessons learned from management actions widely, and use them for adaptive management. Both scientists and managers should be encouraged to ask questions of each other to ensure both understand the best uses for zooplankton data.

These lessons, (and more!) are summarized in a recent essay published in San Francisco Estuary and Watershed Sciences. If that’s too much reading, the team also produced some fact sheets summarizing the major take-home messages of the essay and the symposium:

The team has expanded into an official IEP Project Work Team that meets monthly to discuss new zooplankton research ideas, share analyses, look at cool pictures of bugs, and talk about trends. If you’re interested in joining, contact Sam at Sam.Bashevkin@Deltacouncil.ca.gov

diagram of organism giving presentation

Categories: BlogDataScience, General
  • August 13, 2021

When you are running a long-term monitoring program, it’s easy to keep plugging away doing the same old thing over and over again. That’s what “long term monitoring” is all about right? But is the survey we designed 40 years ago still giving us useful data? With new sampling gear, new statistics, and new mandates, can we improve our monitoring to better meet our needs? These questions have been on the minds of Interagency Ecological Program (IEP) researchers, so an elite team spent over a year doing a rigorous evaluation of three of IEP’s fisheries surveys to figure out how we can improve our monitoring program. The team was assembled with representatives from multiple agencies who each brought something to the table: guidance and facilitation, experience using the data, regulatory background, quantitative skills, and outside statistical expertise. This wasn’t the first time IEP reviewed itself, but it was the first time they tried to take a really quantitative look at it. The team focused on trying to assess the ability of the datasets to answer types of management questions based on themes, so multiple surveys were reviewed together.

The Team

  • Dr. Steve Culberson, IEP Lead Scientist - Guidance and Facilitation
  • Stephanie Fong, IEP Program Manager – Guidance and Facilitation
  • Dr. Jereme Gaeta, CDFW Senior Environmental Scientist – Quantitative Ecologist
  • Dr. Brock Huntsman, USGS Fish Biologist – Quantitative Ecologist
  • Dr. Sam Bashevkin, DSP Senior Environmental Scientist – Quantitative Ecologist
  • Brian Mahardja, USBR Biologist – Quantitative Ecologist and Data User
  • Dr. Mike Beakes USBR Biologist – Quantitative Ecologist and Data User
  • Dr. Barry Noon, Colorado State University – Independent statistical consultant
  • Fred Feyrer, USGS Fish Biologist – Data User
  • Stephen Louie, State Water Board Senior Environmental Scientist – Regulator
  • Steven Slater, CDFW Senior Environmental Scientist - Principal Investigator – FMWT
  • Kathy Hieb, CDFW Senior Environmental Scientist – Principal Investigator – Bay Study
  • Dr. John Durand – UC Davis, Principal Investigator – Suisun Marsh Survey

The Surveys

  • Fall Midwater Trawl (FMWT) – One of the cornerstones of IEP since 1967, this California Department of Fish and Wildlife (CDFW) survey runs from September-December every year and was originally designed to monitor the impact of the State Water Project and Central Valley Project on yearling striped bass.
  • San Francisco Bay Study – On the water since 1980, Bay Study was also run by the CDFW and runs year-round from the South Bay to the Confluence. It also monitors the effects of the Projects on fish communities.
  • Suisun Marsh Survey – Starting in 1979, the Suisun Marsh Survey is conducted by UC Davis with funding from the Department of Water Resources. This survey describes the impact of the Projects and the Suisun Marsh Salinity Control Gates on fish in the Marsh.

The Gear

  • The otter trawl – A big net towed along the ground behind a boat, this type of net targets fish that hang out on the bottom (“demersal fishes”). This net only samples the bottom in deep water, but will sample most of the water column in shallower channels (less than 3 meters deep).
  • The midwater trawl – Another big net, but this one starts at the bottom and is pulled in toward the boat while trawling, gradually reducing the depth of the net so all depths are sampled equally. This net targets fish that like living in open water (“pelagic fishes”).

The question: Can we make it better?

A group of fish get together and look at a diagram that says: Surveys produce Data that inform decisions that inform mandates.
Figure1. The team assembled to see how surveys could generate the best data to inform decisions and fulfill their regulatory mandates.

The question seemed simple – but the answer was unexpectedly complex. While the surveys all targeted similar fish, used similar gear, and went to similar places, they all had enough differences in their survey design, mandates, and institutional history that looking at them together wasn’t easy.

The first step in the review process was, perhaps, the most difficult. The team had to get the buy-in from all the leaders of the surveys under review, all the regulators mandating that the surveys take place, all the people critical of the surveys as they currently stand, and the supervisors of the team who were going to devote a large percentage of their time to the effort. Getting trust from multiple interest groups was challenging, but it was also one of the most rewarding and exciting parts of the process. Stephanie reflected: “We plan to incorporate more of their recommendations in upcoming reviews and increase our collaboration with them… it also would have been helpful if we could have spent more time up front with getting buy-in from those being reviewed and those critical of the surveys.”

Once everyone was on board, the team took a deep dive into the background behind each survey. Why was it established? How have the data been used in the past? Has it made any changes over time? How are the data currently shared and used? Putting together this information gave them a great appreciation for the broad range of experience within IEP. In particular, the team needed to pay attention to the regulatory mandates that first called for the surveys (such as Endangered Species Act Biological Opinions and Water Rights Decisions), to make sure the surveys were still meeting their needs.

The next step was putting the data together, and here’s where it got hard. The team had to find all the data, interpret the metadata, and convert it into standard formats that were comparable between surveys. Even basic things like the names of fish were different. In the FMWT data, a striped bass was “1”, in the Suisun Marsh data a striped bass was “SB”, and in the Bay Study data a striped bass was “STRBAS”. The team quickly identified a few easy steps that could improve the programs without changing a single survey protocol!

  1. Make all data publicly available on the web in the form of non-proprietary flat files (such as text or .csv spreadsheets)
  2. Create detailed metadata documents describing all the information needed to understand the survey (assume the person reading it is a total stranger who knows nothing about your program!)

Figuring out better methods of storing and sharing data is relatively easy, but how do we decide whether we should change when and where and how the surveys actually catch fish? The surveys were all intended to track changes in the fish community, but community-level changes are complex, with over 100 fish species in the estuary. The team decided to divide the task into three parts:

  1. Figure out how to quantify bias between the surveys for individual fish (seeing if some surveys are more likely to catch certain species than the other surveys, Figure 2).
  2. Create a better definition of the “fish community” by identifying which groups of species are caught together more often (Figure 3).
  3. See what happens when we change how often we sample or how many stations we sample. Do we lose any information if we do less work?

Image of a classification tree with four groups of fish labeled: Brackish, Fresh, Marine, and Grunion.
Figure 2. The quantitative ecologists used a form of hierarchical clustering to figure out which groups of fish are most frequently caught together, and which species is most indicative of each group. The indicator species are the ones with the gold stars. Figure adapted from IEP Survey Review Team (2021).

Going through this process involved pulling out all the fancy math and computer programing. Sam, Brian, Jereme, Mike, and Brock explored the world of generalized additive models, principle tensor analysis, Bayesian generalized linear mixed models, hierarchical cluster analysis, and things involving overdispersion in negative binomial distributions. If there were a way to Math their way to the answer, they were going to find it!

Image of a midwater trawl with two fish talking about whether there are any biases in their fishing. They agree that the boat probably catches fewer fish than we think it does.
Figure 3. The team also evaluated biases in sampling gear. Sampling bias occurs when the gear doesn't sample all fish consistently. Sometimes they miss fish of certain sizes, fish that live in certain habitats, or fish that can evade the nets.

For better or worse, Math and a 1-year pilot effort will only get you so far. The team could develop some recommendations, scenarios, and new methods, but it will be up to management to decide how to continue the review effort and then implement change. Their results highlighted a few key points that will be useful in reviewing the rest of IEP’s surveys and making decisions about changes:

  1. Involving stakeholders early in the review process will increase transparency, facilitate sharing of ideas, and promote community understanding.
  2. We need to characterize the biases of our sampling gear in order to make stronger conclusions about fish populations.
  3. Identifying distinct communities of fish helps us track changes over time and space.
  4. We can use Bayesian simulation methods to test the impacts of altered sampling designs on our understanding of estuarine ecology.
  5. These sort of reviews take time and effort by a highly skilled set of scientists, so IEP will need to dedicate a lot of staff to a full review of all their surveys.

Further Reading

Categories: BlogDataScience, General