-
Notifications
You must be signed in to change notification settings - Fork 6
Agency-Grain Census Data Summary Table #1754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
shweta487
commented
Nov 3, 2025
- Queried ACS data via the Census API and uploaded results to a GCS bucket for later use.
- Processed Census Tract geometry.
- Queried organization data from the data warehouse and stored it in GCS.
- Queried bridge organization GTFS datasets and merged them with the dimension organizations table.
- Loaded transit stop data and merged it with organization information.
- Conducted spatial analysis: stop buffers and census tract intersections.
- Adjusted population and demographic metrics for stop service areas.
|
nbviewer URLs for impacted notebooks: |
hhmckay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Apologies for not commenting on specific lines, NB viewer sometimes lets me do that but didn't work here. Let me know if you have any questions and I can provide more clarity. A few overall comments:
- Where do the definitions for the income categories [extremely low, very low, and low income] come from? If there is a specific statutory definition here, all good. If not, I'd consider the AB-1550 definition. It's somewhat tricky to operationalize but is county specific and more nuanced, especially in higher cost of living areas. I can share R code that I've used for this purpose.
- Do Census tract geometries include water areas? It may be more accurate to use tract geometries that exclude water, but this is only a "nice to have" and shouldn't skew results all that much.
- How did you arrive at the 500 meter buffer? I think that's probably fine but let's maybe use a more standardized transit stop catchment area. Maybe use half a mile (804.672 meters).
- Since we are calculating these metrics at the agency-level, we will want to add an additional spatial operation to avoid double counting. After buffering the stops, but before intersecting with tracts, dissolve the buffered stops by agency so that each agency has one feature. It is okay if there is overlap between agencies, but there shouldn't be overlap between individual stops of the same agency (hence the need for a dissolve). I think dissolve() in geopandas should achieve this, but there may be other ways as well.
- There's a simpler way to achieve the desired outcomes from cell 51. I'd modify cell 49 to just calculate the ratio between the adjusted area and original area. Then in cell 51, you can just apply it to all the population figures as you currently do, without having to recalculate the weights using the population data.
- Once each adjusted population figure has been calculated, you need to aggregate (sum) by agency. You could also sum by route, or whatever the desired level of aggregation is.
|
@hhmckay I have addressed all the comments except the first one. Could you please review the PR again? |