-
Notifications
You must be signed in to change notification settings - Fork 15
Change Unzip and Validate GTFS Schedule Hourly start date to allow creating jobs for older dates
#4292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Unzip and Validate GTFS Schedule Hourly start date to allow creating jobs for older dates
4e4c340 to
34f16e4
Compare
34f16e4 to
62060b2
Compare
|
Terraform plan in iac/cal-itp-data-infra/composer/us No changes. Your infrastructure matches the configuration.📝 Plan generated in Plan Terraform for Warehouse and DAG changes #981 |
|
Terraform plan in iac/cal-itp-data-infra/airflow/us Plan: 0 to add, 1 to change, 0 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~ update in-place
Terraform will perform the following actions:
# google_storage_bucket_object.calitp-composer["dags/unzip_and_validate_gtfs_schedule_hourly/METADATA.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-composer" {
!~ crc32c = "9/YYeQ==" -> (known after apply)
!~ detect_md5hash = "2pPOIKu+6ALlfWOnlxTJqg==" -> "different hash"
!~ generation = 1753225938235511 -> (known after apply)
id = "calitp-composer-dags/unzip_and_validate_gtfs_schedule_hourly/METADATA.yml"
!~ md5hash = "2pPOIKu+6ALlfWOnlxTJqg==" -> (known after apply)
name = "dags/unzip_and_validate_gtfs_schedule_hourly/METADATA.yml"
# (17 unchanged attributes hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.📝 Plan generated in Plan Terraform for Warehouse and DAG changes #981 |
|
Terraform plan in iac/cal-itp-data-infra-staging/airflow/us Plan: 0 to add, 20 to change, 3 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~ update in-place
- destroy
Terraform will perform the following actions:
# google_storage_bucket_object.calitp-staging-composer["dags/airtable_loader_v2/generate_gtfs_download_configs.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "MijWlA==" -> (known after apply)
!~ detect_md5hash = "s632w01yc8uo408y4VdAyw==" -> "different hash"
!~ generation = 1763083861264422 -> (known after apply)
id = "calitp-staging-composer-dags/airtable_loader_v2/generate_gtfs_download_configs.py"
!~ md5hash = "s632w01yc8uo408y4VdAyw==" -> (known after apply)
name = "dags/airtable_loader_v2/generate_gtfs_download_configs.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["dags/download_gtfs_schedule_v2/download_schedule_feeds.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "8qLecA==" -> (known after apply)
!~ detect_md5hash = "iGapm0xJ3U0wowUUkId1eQ==" -> "different hash"
!~ generation = 1763083860630232 -> (known after apply)
id = "calitp-staging-composer-dags/download_gtfs_schedule_v2/download_schedule_feeds.py"
!~ md5hash = "iGapm0xJ3U0wowUUkId1eQ==" -> (known after apply)
name = "dags/download_gtfs_schedule_v2/download_schedule_feeds.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["dags/sync_ntd_data_xlsx/scrape_ntd_xlsx_urls.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "g2TBuw==" -> (known after apply)
!~ detect_md5hash = "PSes9rK7j0FP6JcNRQhPLg==" -> "different hash"
!~ generation = 1763083860841240 -> (known after apply)
id = "calitp-staging-composer-dags/sync_ntd_data_xlsx/scrape_ntd_xlsx_urls.py"
!~ md5hash = "PSes9rK7j0FP6JcNRQhPLg==" -> (known after apply)
name = "dags/sync_ntd_data_xlsx/scrape_ntd_xlsx_urls.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["dags/unzip_and_validate_gtfs_schedule_hourly/METADATA.yml"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "9/YYeQ==" -> (known after apply)
!~ detect_md5hash = "2pPOIKu+6ALlfWOnlxTJqg==" -> "different hash"
!~ generation = 1753225939203114 -> (known after apply)
id = "calitp-staging-composer-dags/unzip_and_validate_gtfs_schedule_hourly/METADATA.yml"
!~ md5hash = "2pPOIKu+6ALlfWOnlxTJqg==" -> (known after apply)
name = "dags/unzip_and_validate_gtfs_schedule_hourly/METADATA.yml"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/calitp_data_infra/__init__.py"] will be destroyed
# (because key ["plugins/calitp_data_infra/__init__.py"] is not in for_each map)
- resource "google_storage_bucket_object" "calitp-staging-composer" {
- bucket = "calitp-staging-composer" -> null
- content_type = "text/plain; charset=utf-8" -> null
- crc32c = "AAAAAA==" -> null
- detect_md5hash = "1B2M2Y8AsgTpgAmY7PhCfg==" -> null
- event_based_hold = false -> null
- generation = 1763148327866481 -> null
- id = "calitp-staging-composer-plugins/calitp_data_infra/__init__.py" -> null
- md5hash = "1B2M2Y8AsgTpgAmY7PhCfg==" -> null
- md5hexhash = "d41d8cd98f00b204e9800998ecf8427e" -> null
- media_link = "https://storage.googleapis.com/download/storage/v1/b/calitp-staging-composer/o/plugins%2Fcalitp_data_infra%2F__init__.py?generation=1763148327866481&alt=media" -> null
- metadata = {} -> null
- name = "plugins/calitp_data_infra/__init__.py" -> null
- output_name = "plugins/calitp_data_infra/__init__.py" -> null
- self_link = "https://www.googleapis.com/storage/v1/b/calitp-staging-composer/o/plugins%2Fcalitp_data_infra%2F__init__.py" -> null
- source = "../../../../airflow/plugins/calitp_data_infra/__init__.py" -> null
- storage_class = "STANDARD" -> null
- temporary_hold = false -> null
# (6 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/calitp_data_infra/auth.py"] will be destroyed
# (because key ["plugins/calitp_data_infra/auth.py"] is not in for_each map)
- resource "google_storage_bucket_object" "calitp-staging-composer" {
- bucket = "calitp-staging-composer" -> null
- content_type = "text/plain; charset=utf-8" -> null
- crc32c = "6lsUtA==" -> null
- detect_md5hash = "+/KTbwc3sd3B4wBkY+HoUw==" -> null
- event_based_hold = false -> null
- generation = 1763148327865065 -> null
- id = "calitp-staging-composer-plugins/calitp_data_infra/auth.py" -> null
- md5hash = "+/KTbwc3sd3B4wBkY+HoUw==" -> null
- md5hexhash = "fbf2936f0737b1ddc1e3006463e1e853" -> null
- media_link = "https://storage.googleapis.com/download/storage/v1/b/calitp-staging-composer/o/plugins%2Fcalitp_data_infra%2Fauth.py?generation=1763148327865065&alt=media" -> null
- metadata = {} -> null
- name = "plugins/calitp_data_infra/auth.py" -> null
- output_name = "plugins/calitp_data_infra/auth.py" -> null
- self_link = "https://www.googleapis.com/storage/v1/b/calitp-staging-composer/o/plugins%2Fcalitp_data_infra%2Fauth.py" -> null
- source = "../../../../airflow/plugins/calitp_data_infra/auth.py" -> null
- storage_class = "STANDARD" -> null
- temporary_hold = false -> null
# (6 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/calitp_data_infra/storage.py"] will be destroyed
# (because key ["plugins/calitp_data_infra/storage.py"] is not in for_each map)
- resource "google_storage_bucket_object" "calitp-staging-composer" {
- bucket = "calitp-staging-composer" -> null
- content_type = "text/plain; charset=utf-8" -> null
- crc32c = "b87VYA==" -> null
- detect_md5hash = "sq1Q+wmsL8o0RKJLFUFC7g==" -> null
- event_based_hold = false -> null
- generation = 1763152752579179 -> null
- id = "calitp-staging-composer-plugins/calitp_data_infra/storage.py" -> null
- md5hash = "sq1Q+wmsL8o0RKJLFUFC7g==" -> null
- md5hexhash = "b2ad50fb09ac2fca3444a24b154142ee" -> null
- media_link = "https://storage.googleapis.com/download/storage/v1/b/calitp-staging-composer/o/plugins%2Fcalitp_data_infra%2Fstorage.py?generation=1763152752579179&alt=media" -> null
- metadata = {} -> null
- name = "plugins/calitp_data_infra/storage.py" -> null
- output_name = "plugins/calitp_data_infra/storage.py" -> null
- self_link = "https://www.googleapis.com/storage/v1/b/calitp-staging-composer/o/plugins%2Fcalitp_data_infra%2Fstorage.py" -> null
- source = "../../../../airflow/plugins/calitp_data_infra/storage.py" -> null
- storage_class = "STANDARD" -> null
- temporary_hold = false -> null
# (6 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/hooks/kuba_hook.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "pIf6jA==" -> (known after apply)
!~ detect_md5hash = "M9n0Cr7dL9+4asfxnMHgjQ==" -> "different hash"
!~ generation = 1763148328299170 -> (known after apply)
id = "calitp-staging-composer-plugins/hooks/kuba_hook.py"
!~ md5hash = "M9n0Cr7dL9+4asfxnMHgjQ==" -> (known after apply)
name = "plugins/hooks/kuba_hook.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/hooks/soda_hook.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "Wxj+aQ==" -> (known after apply)
!~ detect_md5hash = "CDAoj9pONq2nsD5BT43vEg==" -> "different hash"
!~ generation = 1763148327870491 -> (known after apply)
id = "calitp-staging-composer-plugins/hooks/soda_hook.py"
!~ md5hash = "CDAoj9pONq2nsD5BT43vEg==" -> (known after apply)
name = "plugins/hooks/soda_hook.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/hooks/transitland_hook.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "xolZ/w==" -> (known after apply)
!~ detect_md5hash = "lMSV7OyTWTBE5ar8Ush1oA==" -> "different hash"
!~ generation = 1763148327895159 -> (known after apply)
id = "calitp-staging-composer-plugins/hooks/transitland_hook.py"
!~ md5hash = "lMSV7OyTWTBE5ar8Ush1oA==" -> (known after apply)
name = "plugins/hooks/transitland_hook.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/aggregator_to_gcs_operator.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "h26Utg==" -> (known after apply)
!~ detect_md5hash = "+2mzXgNPW36mWf5Z3ZZBag==" -> "different hash"
!~ generation = 1763148327878403 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/aggregator_to_gcs_operator.py"
!~ md5hash = "+2mzXgNPW36mWf5Z3ZZBag==" -> (known after apply)
name = "plugins/operators/aggregator_to_gcs_operator.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/blackcat_to_gcs_operator.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "s7BJ7Q==" -> (known after apply)
!~ detect_md5hash = "husvfrVLOwWESpUczaAT3w==" -> "different hash"
!~ generation = 1763148327838984 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/blackcat_to_gcs_operator.py"
!~ md5hash = "husvfrVLOwWESpUczaAT3w==" -> (known after apply)
name = "plugins/operators/blackcat_to_gcs_operator.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/dbt_manifest_to_dictionary_operator.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "4MyK1Q==" -> (known after apply)
!~ detect_md5hash = "FIHwYcjOm5NPB+nPb7enyg==" -> "different hash"
!~ generation = 1763083861606904 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/dbt_manifest_to_dictionary_operator.py"
!~ md5hash = "FIHwYcjOm5NPB+nPb7enyg==" -> (known after apply)
name = "plugins/operators/dbt_manifest_to_dictionary_operator.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/dbt_manifest_to_metadata_operator.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "UDX9og==" -> (known after apply)
!~ detect_md5hash = "F4n9FAx9ExF1Zl3J7NaMWQ==" -> "different hash"
!~ generation = 1763083860841905 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/dbt_manifest_to_metadata_operator.py"
!~ md5hash = "F4n9FAx9ExF1Zl3J7NaMWQ==" -> (known after apply)
name = "plugins/operators/dbt_manifest_to_metadata_operator.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/gtfs_csv_to_jsonl_hourly.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "K3ey8g==" -> (known after apply)
!~ detect_md5hash = "gIuCjsD9Pbg4DHjFe9VJFw==" -> "different hash"
!~ generation = 1763152752256861 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/gtfs_csv_to_jsonl_hourly.py"
!~ md5hash = "gIuCjsD9Pbg4DHjFe9VJFw==" -> (known after apply)
name = "plugins/operators/gtfs_csv_to_jsonl_hourly.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/littlepay_raw_sync_feed_v3.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "TDDjLg==" -> (known after apply)
!~ detect_md5hash = "zXyVjenO8Z0Xcx4u85EVYQ==" -> "different hash"
!~ generation = 1763083860840217 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/littlepay_raw_sync_feed_v3.py"
!~ md5hash = "zXyVjenO8Z0Xcx4u85EVYQ==" -> (known after apply)
name = "plugins/operators/littlepay_raw_sync_feed_v3.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/pod_operator.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "H81Llg==" -> (known after apply)
!~ detect_md5hash = "6vO0LHE3p5d/cOQ71Ghv8g==" -> "different hash"
!~ generation = 1763148327858203 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/pod_operator.py"
!~ md5hash = "6vO0LHE3p5d/cOQ71Ghv8g==" -> (known after apply)
name = "plugins/operators/pod_operator.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/scrape_ntd_xlsx.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "800p7w==" -> (known after apply)
!~ detect_md5hash = "+sUY5347tlkwmkjx/59Ytg==" -> "different hash"
!~ generation = 1763083860847843 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/scrape_ntd_xlsx.py"
!~ md5hash = "+sUY5347tlkwmkjx/59Ytg==" -> (known after apply)
name = "plugins/operators/scrape_ntd_xlsx.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/operators/scrape_state_geoportal.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "HSTs0g==" -> (known after apply)
!~ detect_md5hash = "kroDAzyyod9g32UxYWccew==" -> "different hash"
!~ generation = 1763083861401493 -> (known after apply)
id = "calitp-staging-composer-plugins/operators/scrape_state_geoportal.py"
!~ md5hash = "kroDAzyyod9g32UxYWccew==" -> (known after apply)
name = "plugins/operators/scrape_state_geoportal.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/scripts/gtfs_rt_parser.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "WvH4iA==" -> (known after apply)
!~ detect_md5hash = "ippeGem7GNqos47hTjqtFw==" -> "different hash"
!~ generation = 1763083860890770 -> (known after apply)
id = "calitp-staging-composer-plugins/scripts/gtfs_rt_parser.py"
!~ md5hash = "ippeGem7GNqos47hTjqtFw==" -> (known after apply)
name = "plugins/scripts/gtfs_rt_parser.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer["plugins/utils.py"] will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer" {
!~ crc32c = "PwBS8A==" -> (known after apply)
!~ detect_md5hash = "K9iPt8XD7GnoeSQoTodmng==" -> "different hash"
!~ generation = 1763152752578719 -> (known after apply)
id = "calitp-staging-composer-plugins/utils.py"
!~ md5hash = "K9iPt8XD7GnoeSQoTodmng==" -> (known after apply)
name = "plugins/utils.py"
# (17 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer-catalog will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer-catalog" {
!~ content = (sensitive value)
!~ crc32c = "pQ337w==" -> (known after apply)
!~ detect_md5hash = "3a4S4yKg9jPa5/IW56+cEw==" -> "different hash"
!~ generation = 1763148328386702 -> (known after apply)
id = "calitp-staging-composer-data/warehouse/target/catalog.json"
!~ md5hash = "3a4S4yKg9jPa5/IW56+cEw==" -> (known after apply)
name = "data/warehouse/target/catalog.json"
# (16 unchanged attributes hidden)
}
# google_storage_bucket_object.calitp-staging-composer-manifest will be updated in-place
!~ resource "google_storage_bucket_object" "calitp-staging-composer-manifest" {
!~ content = (sensitive value)
!~ crc32c = "sQuodA==" -> (known after apply)
!~ detect_md5hash = "oE3f8cSibjX2kATWxk0C/A==" -> "different hash"
!~ generation = 1763148330173362 -> (known after apply)
id = "calitp-staging-composer-data/warehouse/target/manifest.json"
!~ md5hash = "oE3f8cSibjX2kATWxk0C/A==" -> (known after apply)
name = "data/warehouse/target/manifest.json"
# (16 unchanged attributes hidden)
}
Plan: 0 to add, 20 to change, 3 to destroy.📝 Plan generated in Plan Terraform for Warehouse and DAG changes #981 |
62060b2 to
148cd56
Compare
|
I would like this to go back to |
148cd56 to
71b6b0e
Compare
71b6b0e to
50858ad
Compare
vevetron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this go back to 2024-01-01?
There is unzip failures in 2024 for schedule date I would like to be rebuilt.
50858ad to
16035a6
Compare
|
Terraform plan in iac/cal-itp-data-infra-staging/composer/us Plan: 0 to add, 1 to change, 0 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~ update in-place
Terraform will perform the following actions:
# google_composer_environment.calitp-staging-composer will be updated in-place
!~ resource "google_composer_environment" "calitp-staging-composer" {
id = "projects/cal-itp-data-infra-staging/locations/us-west2/environments/calitp-staging-composer"
name = "calitp-staging-composer"
# (5 unchanged attributes hidden)
!~ config {
# (8 unchanged attributes hidden)
!~ software_config {
!~ pypi_packages = {
+ "calitp-data-infra" = "==2025.6.5"
+ "pydantic" = ">=1.9,<2.0"
# (11 unchanged elements hidden)
}
# (6 unchanged attributes hidden)
# (1 unchanged block hidden)
}
# (8 unchanged blocks hidden)
}
# (1 unchanged block hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.📝 Plan generated in Plan Terraform for Warehouse and DAG changes #981 |
16035a6 to
b9ecd44
Compare
vevetron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
…n dates from beginning of 2024 [#4294]
b9ecd44 to
bfddab3
Compare
Description
This PR changes
Unzip and Validate GTFS Schedule Hourlystart date to allow run dates from January 1 2024.[#4294]
Also related to #4471 - @vevetron
Type of change
How has this been tested?
It is just a config setting.
Post-merge follow-ups
Check if
unzip_and_validate_gtfs_schedule_hourlyDAG is able to run the dates needed in production.