- 
                Notifications
    You must be signed in to change notification settings 
- Fork 196
[CI] Migrate to pyproject.toml and use poetry for deterministic builds #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[CI] Migrate to pyproject.toml and use poetry for deterministic builds #201
Conversation
| Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
f7bec24    to
    7489e74      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
    
      
        2 similar comments
      
    
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
| Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
b7ddd49    to
    2855fda      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
2855fda    to
    82deefb      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
82deefb    to
    2a376c6      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
2a376c6    to
    030fe75      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
    
      
        1 similar comment
      
    
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
c3830b4    to
    39f857c      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
    
      
        1 similar comment
      
    
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
| - name: "Install pre-commit" | ||
| run: pip install pre-commit | ||
|  | ||
| - name: "Run pre-commit checks" | ||
| run: pre-commit run --all-files | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally keep linting and static analysis jobs separate from jobs which run tests. Makes it easier to see what failed.
It looks like now pre-commit runs as part of the test jobs themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hashhar the reason is to try to adhere to the DRY principle and ensure that local and remote CI are consistent. In this instance rather than explicitly calling out each test/step in the GitHub actions this is now all defined by tox. Any updates to tox.ini will be reflected in the remote CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that is a reasonable benefit it has it's downsides as well - namely that in case of failures you need to go to the CI run logs and see what exactly failed and unless you know what to search/grep for it's painful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are pros and cons with everything. In general adhering to the KISS principle—simplifying logic and ensure consistency—likely out weights the cons. It's worth noting the errors are present in the logs, albeit maybe having to toggle an additional chevron.
| - name: Run tox | ||
| run: | | ||
| pytest -s tests/ | ||
| poetry run tox --parallel | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is --parallel needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hashhar rather than multiple GitHub jobs this workflow is configured to have only one job and thus from a wall-clock perspective having the relevant tox environments run in parallel is more desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having single job makes it harder to inspect which env is failing without having to go to the actions page, and check the logs. Any way we can preserve current matrix behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #201 (comment). I'm really not certain whether the hypothesized concerns are problematic in reality. There's very little friction in introspecting the GitHub workflow logs to determine why CI is failing.
|  | ||
| engine = create_engine("trino://<username>@<host>:<port>/<catalog>/<schema>?access_token=<jwt_token>") | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract no-op changes to separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hashhar this is likely a default from my editor. I'll revert it. Adding the prettier pre-commit hook helps to ensure files are formatted in a consistent manner which long term would prevent these types of no-op changes.
|  | ||
| ``` | ||
| $ pytest tests/unit | ||
| $ poetry run tox -e <environment> -- tests/unit | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to have reasonable default so that when developing locally contributors don't need to think about what env to use. CI will do the exhaustive tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hashhar local development likely wants to run multiple environments, i.e., py39, pre-commit, etc. Personally having used tox for a number of years I have no issue with explicitly invoking the Python specific environment when running the unit/integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reduce friction for contributors (another benefit being that everyone running tox will see same env unless they specify explicitly). Generally people shouldn't need to care what env they are testing against unless the issue is env specific (then they can provide their own choice of env).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure explicitly specifying an environment is friction. It's worth noting that local CI likely requires multiple environments, i.e., a common workflow could be to first run tox -e pre-commit to ensure linting, typing, etc. is correct before running the test via tox -e py39 say.
One could argue having a these environments—a subset of which can be run via the -e argument—actually reduces friction as it obfuscates the actual commands from the user, i.e., rather than the user needing to know they need to run the pytest, pre-commit, etc. commands (along with the relevant flags/arguments) they simply invoke the respective environment through the tox CLI which standardizes testing workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps adding some examples like poetry run tox -e py39 -- tests/unit could be a reasonable consensus? Contributors usually follow README first, or we can have a sample local testing workflow described like run pre-commit then running tests etc.
| rm -rf dist/ && | ||
| ./setup.py sdist bdist_wheel && | ||
| twine upload dist/* && | ||
| poetry publish --build && | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any possibility of clearing old state/artifacts for safety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hashhar I'm unsure. There's python-poetry/poetry#1329, however it has been open for some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the rm -rf dist/ in the steps - I verified that Poetry indeed doesn't clean up old artifacts.
        
          
                .pre-commit-config.yaml
              
                Outdated
          
        
      | - "types-requests" | ||
|  | ||
| - repo: "https://github.com/python-poetry/poetry" | ||
| rev: "1.2.0b3" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's switch to stable release before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hovaesco the reason for including a pre-release it it exposes dependency groups. It's not clear from python-poetry/poetry#5586 when the 1.2.0 release is scheduled.
Are there any major concerns with using a pre-release in the interim especially if it seems to fulfill our needs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-releases can always change and impact our current approach. Testing with pre-release makes lots of sense but I would opt to use official release in the project.
        
          
                README.md
              
                Outdated
          
        
      | $ python3 -m venv .venv | ||
| $ . .venv/bin/activate | ||
| $ pip install . | ||
| $ curl -sSL https://install.python-poetry.org | python3 - --preview | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait for the official release before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment.
        
          
                pyproject.toml
              
                Outdated
          
        
      | @@ -0,0 +1,66 @@ | |||
| [build-system] | |||
| requires = ["poetry-core>=1.1.0b3"] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be 1.2.0b3 as for now?
        
          
                pyproject.toml
              
                Outdated
          
        
      |  | ||
| [tool.poetry] | ||
| name = "trino-python-client" | ||
| version = "0.314.0" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some changes in versioning guides in README.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look here
| pytz = "*" | ||
| requests = "*" | ||
| requests_kerberos = { version = "*", optional = true } | ||
| sqlalchemy = { version = "~1.4", optional = true } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was ~1.3 in setup.py previously, is there any specific reason for that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hovaesco this is precisely the reason for this change, i.e., to ensure deterministic builds. Specifically when this was set to ~1.3 running,
poetry run tox -e py39 -- tests/unit
would result in the following error:
Traceback:
../../../.pyenv/versions/3.9.4/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/sqlalchemy/test_dialect.py:5: in <module>
    from sqlalchemy.engine import make_url
E   ImportError: cannot import name 'make_url' from 'sqlalchemy.engine'
In SQLAlchemy v1.4 they exposed the make_url function to the to the sqlachemy.engine module as previously in v1.3 one would need to import this via,
from sqlalchemy.engine.url import make_url
i.e., v1.3 is not viable given how the current code is written and thus the lower bound was wrong. This logic was added in da1441f which defined the sqlalechemy~=1.3 requirement (note the inclusion of the = which differs from the Poetry dependency), which—per PEP 440—implies,
For a given release identifier V.N , the compatible release clause is approximately equivalent to the pair of comparison clauses:
>= V.N, == V.*
thus the sqlalechemy~=1.3 requirement is equivalent to sqlachemy>=1.3, == 1.*. If you look at a GitHub workflow CI run (example) you can see,
Collecting sqlalchemy~=1.3
  Downloading SQLAlchemy-1.4.39-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
that the CI is installing the latest version (v1.4.39) which adheres to the requirement specification.
Note in Poetry the tilde requirements  (which don't include the =) differ slightly, so ~1.3 is equivalent to >= 1.3.0, < 1.4.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the detailed explanation.
ac8dd71    to
    83938b0      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
83938b0    to
    4932a79      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
4932a79    to
    d2437ef      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
d2437ef    to
    8cca8e3      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
8cca8e3    to
    f3e84eb      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
f3e84eb    to
    70cb0b1      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
| - name: "Install pre-commit" | ||
| run: pip install pre-commit | ||
|  | ||
| - name: "Run pre-commit checks" | ||
| run: pre-commit run --all-files | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If those steps are removed, then I assume checks jobs is redundant. Outstanding jobs step be moved to build job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it would be nice to still execute pre-commit checks in checks job. Then build could be run only if checks job is successful. It will reduce runners usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hovaesco is it necessary to split out the checks vs the build? I agree that one could argue that pre-commit performs pre-flight checks, however i) it does mean the DRY principle isn't really adhered to, and ii) you could argue (per the name) that these should actually be first run locally (alongside all the other local CI checks) at the time of the commit. This is actually the most efficient use of resources and results in faster iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot control behaviour of people who submit PRs. It's still useful to let people submit PRs in an imperfect shape and guide them to the finish line so that their next contribution is easier.
Also keeping the checks split has two benefits:
- Lesser wall-time for each check.
- Easy to see at a glance what failed with zero additional clicks (that's also why a matrix was used instead of tox for the version based tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with arguments given @hashhar that CI should still have those jobs split out + it will reduce runners usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general comment, could you please squash commits (if possible) and rename commit messages?
70cb0b1    to
    3d2464c      
    Compare
  
    3d2464c    to
    f97d3da      
    Compare
  
    | Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: John Bodley. 
 | 
f97d3da    to
    1226d39      
    Compare
  
    1226d39    to
    4545d7c      
    Compare
  
    f4d114e    to
    11858f7      
    Compare
  
    11858f7    to
    ecc5a3b      
    Compare
  
    
Somewhat of a yak shaving exercise as I was initially going to look into #199 though realized that the
trino-python-clientpackage has no deterministic dependency management. See the Slack discussion here for more details.There's a slew of frameworks for handling Python dependencies including Pipenv, pip-tools, etc. however it seems like Poetry is well supported and integrates with
pyproject.toml—a replacement for the legacysetup.py. Per #201 (comment), the importance is having a dependency manager rather than the dependency manager. If necessary is should be fairly trivial to later migrate to hatch or similar.This PR:
setup.pyin favor ofpyproject.toml.poetry.lockfile.tox—ensuring a consistent local and remote CI workflow.to: @bitsondatadev @hashhar