Science Stories: Adventures in Bay-Delta Data

rss
  • November 22, 2023
Adult Chinook Salmon with a fin tag being released by a scientist into a river. Photo from CDFW.

By Peter Nelson

The Challenge

Spring-run Chinook salmon (“spring-run”) are listed as threatened under both the California Endangered Species Act and the Federal Endangered Species Act. Like most salmon, these fish are anadromous: The adults, having grown and matured in the ocean, return to their natal stream to spawn, and the juveniles, after rearing in freshwater, eventually migrate downstream to the ocean (see Figure 1). During that downstream migration, juvenile salmon are exposed to a gauntlet of threats, including warm water temperatures, predators of all sorts, and “taking the wrong turn” through water diversions and getting lost on their way to the ocean. Managing or reducing the risk posed by water diversions is a responsibility of the Department of Water Resources, and to do that water managers need to know the number and timing of those outmigrating juvenile spring-run as they enter the Delta. Coming up with an accurate prediction of this—what’s termed a Juvenile Production Estimate (PDF) or JPE—is not simple. This is the first of a two part piece about our efforts to develop a JPE, both what’s been accomplished and what’s planned, as well as a timeline.

Diagram of spring-run salmon life cycle showing adults migrating upstream into the mountains where they hang out in cold water pools below dams all summer before spawning. Juveniles then travel through the delta to the ocean to mature. - link opens in new window

Figure 1. Spring-run chinook have a complex life cycle. The adults migrate upstream in January through March, but instead of spawning right away like most salmon they hold in coldwater pools all summer and spawn in the fall. Diagram by Rosemary Hartman, Department of Water Resources. Click to enlarge.

The Approach

We know when to expect adult spring-run to return to their natal streams to spawn based on past experience: Humans, beginning with the indigenous peoples of the West Coast, have been observing these runs for generations, and we might reasonably expect that the numbers of returning adult salmon are a decent predictor of the juvenile fish those returning salmon will eventually produce. Observations by multiple teams of biologists of adult salmon throughout the Central Valley allow us to predict the likely numbers of juvenile spring-run expected to migrate downstream and enter the Delta on the way to the Pacific Ocean each year. “Hold on a minute,” you might say, “What about the water in those streams? If the creeks are low and the water is warm, surely those baby salmon won’t do as well as they might when conditions are good.” You’d be right! The number of reproducing salmon—the parents—isn’t a perfect predictor of the number of offspring: There are many environmental factors that affect juvenile production, but, based on past studies of salmon ecology, we can include factors like flow in our analysis of the likely number of juveniles that will be produced by the annual return of adult salmon (for example, see Michel 2019; Singer et al. 2020).

These estimates, however, are just that—we can’t know exactly how the varying amount of water will affect the survival of juvenile salmon as they grow and migrate, but we should get reasonably close, and we have another source of information to improve our estimates, the number of outmigrating juveniles that we observe directly as they swim towards the Delta: The streams where spring-run spawn regularly have rotary screw traps (Video) (RSTs, Figure 2) on them. These devices divert migrating juveniles into a holding pen where biologists count and measure them each day before releasing them back into the stream to continue their journey to the sea. Data from these RSTs give us another check on our estimates based on spawner production, and are themselves an alternative means for estimating spring -run juvenile production.

A rotary screw trap with a conical trap and surrounding deck deployed in a narrow channel with trees growing on the bank.

Figure 2. A rotary screw trap floating in the Yolo Bypass Toe Drain with its cone out of the water (not sampling). Photo courtesy of the Department of Water resources.

One last point: In order for water managers to use these predictions for how many (and when) spring-run are expected to reach the Delta, these estimates need to happen each year before spring-run are expected to enter the Delta when water managers need to make decisions about their operations. This is especially tricky for estimates that rely on that RST data because it only takes a few weeks for juvenile salmon to travel from the RSTs to the Delta. This means that the process of counting adult salmon and (especially) juvenile salmon in the RSTs, entering those data into a shared database, and crunching the numbers to produce a JPE must be fast, efficient and accurate.

Gathering Information

This is a collaborative, interagency effort, which we began by holding a broad-based, public workshop in September 2020 with the Department of Fish and Wildlife (see Nelson et al. 2022 for details) and writing a science plan (PDF) with our agency partners to determine what monitoring data were needed to develop a spring-run JPE. Estimating an annual spring-run JPE is complicated by (1) the broad geographic and geologic range of Central Valley streams that support spring-run, (2) the challenge of developing a holistic, coordinated multi-agency monitoring framework for generating quantitative estimates of juvenile spring-run across their range, (3) the variable life history displayed across the spring-run streams, and (4) the difficulty of distinguishing juvenile spring-run from other run types (fall run, late-fall run, and winter run) found in the same streams (we will talk more about distinguishing salmon run type in our next blog post).

Monitoring

Most of the monitoring in spring-run streams is conducted by the staff of several governmental agencies (e.g., Deer Creek), gathering data on the numbers and timing of returning adults and of migrating juveniles, and tracking the changes in these metrics from year-to-year, but monitoring historically was designed to focus on local management needs, employed multiple methods and focused on different life stages across the watershed. Some work has been done to integrate data on number of returning adults (CDFW's GrandTab dataset, which produced the graph of returning adults, Figure 3 below). However, a spring-run JPE will require more a coordinated approach with the means of combining data from more than 40 monitoring programs from eight regions, several governmental agencies, and nearly two dozen data stewards and managers, using diverse methods and having large discrepancies in monitoring histories. These are significant challenges, but they can be met as long as we’re aware of the limitations (see below).

Bar graph of returning adult spring-run chinook salmon in the JPE tributaries. Total escapement varies from over 20,000 to less than 1,000, with Butte Creek having the highest returns. - link opens in new window

Figure 3. Total escapement (number of returning adults) by tributary for 2000-2022. Click for an enlarged version broken out by tributary.

In addition to gathering data on the number and timing of returning adults and departing juveniles, we’ll also need data on year-to-year salmon spawning success and on the survival of those outmigrating juveniles as they move from higher elevation habitats through lower, slower and warmer tributaries, and as they migrate down the mainstem of the Sacramento River to finally reach the Delta (streams with major spring-run spawning are shown in Figure 4).

Environmental conditions too are crucial: Preeminent are the quantity of water in the system and water temperature; we know that these have strong effects on salmon survivorship and behavior. The number and location of predators also vary from year to year and can affect the number of juvenile spring-run reaching the Delta.

Map of the Sacramento Valley watershed highlighting Clear Creek, Butte Creek, Battle Creek, Deer Creek, and the Feather River, where spring-run spawn. - link opens in new window

Figure 4. Map of the Sacramento River watershed highlighting the rivers and streams where data is being collected for the spring-run JPE. Some spring-run also spawn in the San Joaquin watershed, but they have not been added to the spring-run JPE dataset yet. Click to enlarge.

Data Management

You may have heard the expression, “garbage in, garbage out”? Wherever the phrase originated, it certainly applies to ecology! Quality data and metadata (how, when, where, and by whom the data are collected) are critical to an accurate spring-run JPE and its application to salmon conservation and water management. DWR led the formation of a team to design a data management system. This team conducted extensive outreach to the various monitoring programs for the seven spring-run spawning streams identified as most important to the JPE.

This data management system is now a reality, and is designed to provide timely access to machine-readable monitoring data and metadata. To meet the annual deadlines for calculating a spring-run JPE, new RST data must be compatible across programs and reported rapidly. Building the initial dataset took over a year because of historical inconsistencies in data reporting across monitoring programs, but state and federal agencies are collaborating to make newly collected data compatible from the moment of data entry. Data from some monitoring programs are now acquired automatically from digital entry and uploads are occurring directly from the field daily; the rest of the monitoring programs will move to this “field-to-cloud” data entry system over the next several years, improving data quality and the greatly facilitating the ease of access. All historical RST data are now publicly available from the Environmental Data Initiative (use search term “JPE”), and new RST data will be added to this repository on a weekly basis. Indeed, one of the most exciting and novel aspects of the spring-run JPE effort is that it has unified much of the existing data reporting from multiple agencies monitoring along with new monitoring under a common goal and purpose.

The spring-run JPE data management program

  • has now standardized data collection methodologies, schemas, encodings, and processing protocols;
  • produces machine-readable data for all RST monitoring programs (adult data will follow soon);
  • uploads data in near real-time to a shared data management system; and
  • makes data publicly accessible in a simple format.

This system allows us to look at all the different data sources at once to learn new things! For example, if we plot the catch of salmon from the rotary screw traps at Mill Creek, the Feather River, Knights Landing, and Delta Entry from upstream to downstream (Figure 5) we see that the most upstream site (Mill Creek) catches salmon earlier than the downstream sites and catches a lot more of them. Moving downstream the catch gets smaller and smaller as juvenile salmon get lost, eaten, or die along the way. Mill Creek also has juvenile salmon leaving the stream as late as May or June, but very few of these fish make it all the way down to the Delta, indicating that later migrants might have a harder time surviving.

Ridgeline plot showing timing and number of juvenile outmigrants at Mill Creek, Feather River, Knights Landing, and Delta Entry. - link opens in new window

Figure 5. Plot of rotary screw trap catch over time for the spring of 2023 at several locations in the Central Valley. Click to enlarge.

In our next post on the spring-run JPE, we’ll describe the cutting-edge genetic tools we’re using to distinguish spring-run from the other Central Valley Chinook, the quantitative modeling we’re developing that pulls in all of the salmon and environmental data and actually produced a juvenile production estimate along with an indication of our confidence in that estimate, the peer-review process that will critique our program and recommend improvements, and where we expect to take this spring -run JPE program next.

Further Reading

Categories: General
  • July 5, 2022

Authors: Rosemary Hartman and the IEP Data Utilization Work Group

Here at IEP, we collect a lot of data, and we do a lot of science. However, people haven’t always realized how much data we collect because it hasn’t always been easy to find. For scientists that were able to find the data, sometimes it was difficult to understand or it was shared in a hard-to-use format. That’s why IEP’s Data Utilization Work Group (DUWG) has been pushing for more Open Science practices over the past five years to make our data more F.A.I.R (Findable, Accessible, Interoperable, and Reusable). And wow! We’ve come a long way in a short time.

A staircase with FAIR Principles written on it and stick figures climbing it. Circles are around the staircase.  One shows a map pin that says Persistent and Findable. One shows an open lock that says 'Accessible' with meaningful interaction. One shows a person and a puzzle and says 'Reusable with Full Disclosure', and one shows two computers with a line between them and says 'Interoperable'.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

What is Open Science anyway? Well, I was going to call it “the cool-kids club” but really it’s the opposite of a club! It’s the anti-club that makes sure everyone has access to science – no membership required. Open science means that all scientists communicate in a transparent, reproducible way, with open-access publications, freely shared data, open-source software, and openness to diversity of knowledge. Open science encourages collaboration and breaks down silos between researchers – so it’s a natural fit for a 9-member collaborative organization like IEP.

For IEP, the ‘open data’ component is where we’ve really been making strides. While “share your data freely” sounds easy, it’s actually taken a lot of work to make our data FAIR. As government entities, theoretically all of the data we collect is held in the public trust, but putting data in a format that other people can use is not simple. Here are some of the things we have done to make IEP data more open:

Data Management Plans

The first thing the DUWG did was get all IEP projects to fill out a simple, 2-page data management plan outlining what was being done with the data in short, clear sections:

  • Who: Principal investigator and point of contact for the data.
  • What: Description of data to be collected and any related data that will be incorporated into the analysis.
  • Metadata: How the metadata will be generated and where it will be made available.
  • Format: What format the data will be stored in and what format it will be shared in, which may not be the same. For example, you may store data in an Access database but share it in non-proprietary .csv formats.
  • Storage and Backup: Where you will put the data as you are collecting it and how it will be backed-up for easy recovery. This is about short-term storage.
  • Archiving and Preservation: This is about long-term storage to keep your data for someone years down the line. This is best done with publication on a data archive platform, such as the Environmental Data Initiative (EDI).
  • Quality Assurance: Brief description of Quality Assurance and Quality Control (QAQC) procedures and where a data user can access full QAQC documentation.
  • Access and Sharing: How can users find your data? Is it posted on line or by request? Are there any restrictions on how the data can be used or shared? 

You can find instructions (PDF) and a template (PDF) for Data Management Plans on the DUWG page. All of IEP’s data management plans are also posted on the IEP website.

Data Publication

Many IEP agencies were already sharing data on agency websites, but most of this was done without formal version control, machine-readable metadata, or digital object identifiers (DOIs), making it difficult to track how data were being used. Now IEP is recommending publishing data on EDI or other data archives. Datasets now have robust metadata, open-source data formats (like .csv tables instead of Microsoft Access databases), and DOIs for each version so studies using these data can be reproduced easily.

Cartoons of stick people illustrating the phases of the data life cycle with arrows connecting them. Data collection - People with nets catching shapes.  Data processing - people take shapes out of a box labeled short-term storage and lay them out on a table. Data Study and Analysis - people make patterns with the shapes. Data publishing and access - People present the data to an audience. Data Preservation - People put shapes in tubes and boxes. Data re-use - people open tubes and a string of shapes come out. Research ideas - Shapes inside a light bulb.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

Metadata Standards

The term “metadata” can mean different things to different people. Some people may think it simply means the definitions of all the columns in a data set. Some people may think it means a history of changes to the sampling program. Some people think it’s your standard operating procedures. Some people may think it means data about social media networks. What is it? Well, it’s the “who, what, where, when, why, and how” of your data set. It should include everything a data user needs to understand your data as well as you do. The DUWG developed a template for metadata that includes everything we think you should include in full documentation for a dataset. Some of it might not apply to every dataset, but it is a good checklist to get you started.

You can find the Metadata template (PDF) on the DUWG page.

QAQC standards

The DUWG is just starting to dig into QAQC. Quality assurance is an integrated system of management activities to prevent errors in your data, while quality control is a system of technical activities to find errors in your data. QAQC systems have become standard practice in analytical labs, but the formalization and standardization of QAQC practices is new for a lot of the fish-and-bug-counters at IEP. The DUWG QAQC sub-team developed a template for Standard Operating Procedures (PDF), and is working to provide guidance for QAQC of all types of data, and for integrating QAQC into all sampling programs. This promotes consistency across time, people, and space, increases transparency, and gives users more confidence in your data.

Dataset integration

One of the great things about laying down the framework for open data that includes data publication, documentation, and quality control is that it then becomes much easier to integrate datasets across programs. The IEP synthesis team (spearheaded by Sam Bashevkin of the Delta Science Program) has developed several integrated datasets that pull publicly accessible data, put them in a standard format, and publish them in a single, easy-to-use format.

Spreading the Word

We’re also making sure EVERYONE knows about how great our data are.

  • We’ve revamped the data access webpage on our IEP site.
  • Publishing data on EDI makes it available on DataOne, which allows searches across multiple platforms.
  • Publishing data papers is a relatively new way to let people know about a dataset. For example, this zooplankton data paper was recently published in PLOSOne.
  • We’ve made presentations at the Water Data Science Symposium and other scientific meetings.
  • We published an Open Data Framework Essay in San Francisco Estuary and Watershed Sciences.
  • We also put on a Data Management Showcase (video) that you can watch via the Department of Water Resources YouTube Channel.
  • Plus, we have lots more data management resources available on the DUWG website.

Together, we're putting IEP Data on the Open Science Train to global recognition. 

Questions? Feel free to reach out to the DUWG co-chairs: Rosemary Hartman and Dave Bosworth. If you have any suggestions for improving data management or sharing, we want to hear about it.

Two birds are in a fountain labeled Fountain of Open Data. One asks: You mind if I reuse this data? The other says: Go ahead! we can even work together on it.

This image was created by Scriberia for The Turing Way community and is used under a CC-BY license. DOI: 10.5281/zenodo.3332807

Further Reading

Categories: BlogDataScience