Skip to content

Commit 0ebda36

Browse files
raeboterikamov
andcommitted
Replace siuba usage with SQLAlchemy for schedule_rt_utils
* add SQLAlchemy models for all impacted tables * tests for all changed functions * setup GitHub Actions to run tests * remove siuba * pytest-recording for recording and replaying BigQuery requests * add dependencies needed for testing to requirements.txt Co-authored-by: Erika Pacheco <erika@ministryofvelocity.com>
1 parent 3abaee3 commit 0ebda36

25 files changed

+2100
-111
lines changed

.github/workflows/pytest.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: pytest
2+
3+
on: [push]
4+
5+
env:
6+
PROJECT_ID: 'cal-itp-data-infra-staging'
7+
WORKLOAD_IDENTITY_PROVIDER: 'projects/473674835135/locations/global/workloadIdentityPools/github-actions/providers/data-analyses'
8+
SERVICE_ACCOUNT: 'github-actions-service-account@cal-itp-data-infra-staging.iam.gserviceaccount.com'
9+
10+
jobs:
11+
test:
12+
name: Run tests
13+
runs-on: ubuntu-latest
14+
15+
permissions:
16+
contents: read
17+
id-token: write
18+
19+
steps:
20+
- name: Checkout
21+
uses: actions/checkout@v5
22+
23+
- name: Authenticate Google Service Account
24+
uses: google-github-actions/auth@v2
25+
with:
26+
create_credentials_file: true
27+
project_id: ${{ env.PROJECT_ID }}
28+
workload_identity_provider: ${{ env.WORKLOAD_IDENTITY_PROVIDER }}
29+
service_account: ${{ env.SERVICE_ACCOUNT }}
30+
31+
- name: Set up Python
32+
uses: actions/setup-python@v5
33+
with:
34+
python-version: '3.11'
35+
cache: 'pip'
36+
37+
- name: Install share_utils dependencies
38+
working-directory: _shared_utils/
39+
run: pip install -r requirements.txt
40+
41+
- name: Run shared_utils tests
42+
working-directory: _shared_utils/
43+
run: pytest tests

_shared_utils/requirements.txt

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,17 @@
11
-e .
22
altair-transform==0.2.0
3+
calitp-data-analysis==2025.8.10
34
great_tables==0.16.1
5+
intake==0.6.4
6+
numba (>=0.62.1, <0.63.0)
7+
numpy (>=1.26.4, <2.0.0)
48
omegaconf==2.3.0 # better yaml configuration
59
polars==1.22.0
610
pytest (>=8.4.1, <9.0.0)
7-
quarto-cli==1.6.40
11+
pytest-mock (>=3.15.1, <4.0.0)
12+
pytest-recording (>=0.13.4,<0.14.0)
13+
pytest-unordered (>=0.7.0,<0.8.0)
814
quarto==0.1.0
15+
quarto-cli==1.6.40
916
vegafusion==2.0.2
1017
vl-convert-python>=1.6.0
Lines changed: 31 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,32 @@
1-
from . import (
2-
arcgis_query,
3-
catalog_utils,
4-
dask_utils,
5-
geo_utils,
6-
gtfs_utils_v2,
7-
portfolio_utils,
8-
publish_utils,
9-
rt_dates,
10-
rt_utils,
11-
schedule_rt_utils,
12-
time_helpers,
13-
)
1+
import sys
142

15-
__all__ = [
16-
"arcgis_query",
17-
"catalog_utils",
18-
"dask_utils",
19-
"geo_utils",
20-
"gtfs_utils_v2",
21-
"portfolio_utils",
22-
"publish_utils",
23-
"rt_dates",
24-
"rt_utils",
25-
"schedule_rt_utils",
26-
"time_helpers",
27-
]
3+
if hasattr(sys, "_called_from_test"):
4+
pass
5+
else:
6+
from . import (
7+
arcgis_query,
8+
catalog_utils,
9+
dask_utils,
10+
geo_utils,
11+
gtfs_utils_v2,
12+
portfolio_utils,
13+
publish_utils,
14+
rt_dates,
15+
rt_utils,
16+
schedule_rt_utils,
17+
time_helpers,
18+
)
19+
20+
__all__ = [
21+
"arcgis_query",
22+
"catalog_utils",
23+
"dask_utils",
24+
"geo_utils",
25+
"gtfs_utils_v2",
26+
"portfolio_utils",
27+
"publish_utils",
28+
"rt_dates",
29+
"rt_utils",
30+
"schedule_rt_utils",
31+
"time_helpers",
32+
]
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from sqlalchemy import Boolean, Column, DateTime, String
2+
from sqlalchemy.orm import declarative_base
3+
4+
Base = declarative_base()
5+
6+
7+
class BridgeOrganizationsXHeadquartersCountyGeography(Base):
8+
__tablename__ = "bridge_organizations_x_headquarters_county_geography"
9+
10+
organization_key = Column(String, primary_key=True)
11+
county_geography_key = Column(String)
12+
organization_name = Column(String)
13+
county_geography_name = Column(String)
14+
_valid_from = Column(DateTime)
15+
_valid_to = Column(DateTime)
16+
_is_current = Column(Boolean)
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
from sqlalchemy import Boolean, Column, DateTime, Integer, String
2+
from sqlalchemy.orm import declarative_base
3+
4+
Base = declarative_base()
5+
6+
7+
class DimCountyGeography(Base):
8+
__tablename__ = "dim_county_geography"
9+
10+
key = Column(String, primary_key=True)
11+
source_record_id = Column(String)
12+
name = Column(String)
13+
fips = Column(Integer)
14+
msa = Column(String)
15+
caltrans_district = Column(Integer)
16+
caltrans_district_name = Column(String)
17+
place_geography = Column(String)
18+
organization_key = Column(String)
19+
service_key = Column(String)
20+
_is_current = Column(Boolean)
21+
_valid_from = Column(DateTime)
22+
_valid_to = Column(DateTime)
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
from sqlalchemy import Boolean, Column, Date, DateTime, String
2+
from sqlalchemy.orm import declarative_base
3+
4+
Base = declarative_base()
5+
6+
7+
class DimGtfsDataset(Base):
8+
__tablename__ = "dim_gtfs_datasets"
9+
10+
key = Column(String, primary_key=True)
11+
source_record_id = Column(String)
12+
name = Column(String)
13+
type = Column(String)
14+
regional_feed_type = Column(String)
15+
backdated_regional_feed_type = Column(String)
16+
uri = Column(String)
17+
future_uri = Column(String)
18+
deprecated_date = Column(Date)
19+
data_quality_pipeline = Column(Boolean)
20+
manual_check__link_to_dataset_on_website = Column(String)
21+
manual_check__accurate_shapes = Column(String)
22+
manual_check__data_license = Column(String)
23+
manual_check__authentication_acceptable = Column(String)
24+
manual_check__stable_url = Column(String)
25+
manual_check__localized_stop_tts = Column(String)
26+
manual_check__grading_scheme_v1 = Column(String)
27+
base64_url = Column(String)
28+
private_dataset = Column(Boolean)
29+
analysis_name = Column(String)
30+
_is_current = Column(Boolean)
31+
_valid_from = Column(DateTime)
32+
_valid_to = Column(DateTime)
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
from sqlalchemy import Boolean, Column, DateTime, Integer, String
2+
from sqlalchemy.orm import declarative_base
3+
4+
Base = declarative_base()
5+
6+
7+
class DimOrganization(Base):
8+
__tablename__ = "dim_organizations"
9+
10+
key = Column(String, primary_key=True)
11+
source_record_id = Column(String)
12+
name = Column(String)
13+
organization_type = Column(String)
14+
roles = Column(String)
15+
itp_id = Column(Integer)
16+
details = Column(String)
17+
website = Column(String)
18+
reporting_category = Column(String)
19+
hubspot_company_record_id = Column(String)
20+
gtfs_static_status = Column(String)
21+
gtfs_realtime_status = Column(String)
22+
_deprecated__assessment_status = Column(Boolean)
23+
manual_check__contact_on_website = Column(String)
24+
alias = Column(String)
25+
is_public_entity = Column(Boolean)
26+
ntd_id = Column(String)
27+
ntd_agency_info_key = Column(String)
28+
ntd_id_2022 = Column(String)
29+
rtpa_key = Column(String)
30+
rtpa_name = Column(String)
31+
mpo_key = Column(String)
32+
mpo_name = Column(String)
33+
public_currently_operating = Column(Boolean)
34+
public_currently_operating_fixed_route = Column(Boolean)
35+
_is_current = Column(Boolean)
36+
_valid_from = Column(DateTime)
37+
_valid_to = Column(DateTime)
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
from sqlalchemy import Boolean, Column, DateTime, Integer, String
2+
from sqlalchemy.orm import declarative_base
3+
4+
Base = declarative_base()
5+
6+
7+
class DimProviderGtfsData(Base):
8+
__tablename__ = "dim_provider_gtfs_data"
9+
10+
key = Column(String, primary_key=True)
11+
public_customer_facing_fixed_route = Column(Boolean)
12+
public_customer_facing_or_regional_subfeed_fixed_route = Column(Boolean)
13+
organization_key = Column(String)
14+
organization_name = Column(String)
15+
organization_itp_id = Column(Integer)
16+
organization_hubspot_company_record_id = Column(String)
17+
organization_ntd_id = Column(String)
18+
organization_source_record_id = Column(String)
19+
service_key = Column(String)
20+
service_name = Column(String)
21+
service_source_record_id = Column(String)
22+
gtfs_service_data_customer_facing = Column(Boolean)
23+
regional_feed_type = Column(String)
24+
associated_schedule_gtfs_dataset_key = Column(String)
25+
schedule_gtfs_dataset_name = Column(String)
26+
schedule_source_record_id = Column(String)
27+
service_alerts_gtfs_dataset_name = Column(String)
28+
service_alerts_source_record_id = Column(String)
29+
vehicle_positions_gtfs_dataset_name = Column(String)
30+
vehicle_positions_source_record_id = Column(String)
31+
trip_updates_gtfs_dataset_name = Column(String)
32+
trip_updates_source_record_id = Column(String)
33+
schedule_gtfs_dataset_key = Column(String)
34+
service_alerts_gtfs_dataset_key = Column(String)
35+
vehicle_positions_gtfs_dataset_key = Column(String)
36+
trip_updates_gtfs_dataset_key = Column(String)
37+
_valid_from = Column(DateTime)
38+
_valid_to = Column(DateTime)
39+
_is_current = Column(Boolean)
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from sqlalchemy import Boolean, Column, DateTime, Float, Integer, String
2+
from sqlalchemy.orm import declarative_base
3+
4+
Base = declarative_base()
5+
6+
7+
class FctDailyFeedScheduledServiceSummary(Base):
8+
__tablename__ = "fct_daily_feed_scheduled_service_summary"
9+
10+
service_date = Column(DateTime, primary_key=True)
11+
feed_key = Column(String, primary_key=True)
12+
gtfs_dataset_key = Column(String, primary_key=True)
13+
ttl_service_hours = Column(Float)
14+
n_trips = Column(Integer)
15+
first_departure_sec = Column(Integer)
16+
last_arrival_sec = Column(Integer)
17+
num_stop_times = Column(Integer)
18+
n_routes = Column(Integer)
19+
contains_warning_duplicate_stop_times_primary_key = Column(Boolean)
20+
contains_warning_duplicate_trip_primary_key = Column(Boolean)
21+
contains_warning_missing_foreign_key_stop_id = Column(Boolean)

0 commit comments

Comments
 (0)