Skip to content

Commit

Permalink
Merge pull request #106 from Gallaecio/trust-env
Browse files Browse the repository at this point in the history
Support driving Zyte API requests through a proxy
  • Loading branch information
kmike authored Jul 14, 2023
2 parents 0c8b648 + 4361ca7 commit 28927c0
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 1 deletion.
3 changes: 3 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ Changes

* Improved result caching in the scrapy-poet provider.

* Added a new setting, ``ZYTE_API_USE_ENV_PROXY``, which can be set to ``True``
to access Zyte API using a proxy configured in the local environment.

* Fixed getting the Scrapy Cloud job ID.

* Improved the documentation.
Expand Down
8 changes: 8 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -913,3 +913,11 @@ We are planning to solve these problems in the future releases of
``scrapy-poet`` and ``scrapy-zyte-api``.

.. _scrapy-poet provider: https://scrapy-poet.readthedocs.io/en/stable/providers.html


Running behind a proxy
======================

If you require a proxy to access Zyte API (e.g. a corporate proxy), configure
the ``HTTP_PROXY`` and ``HTTPS_PROXY`` environment variables accordingly, and
set the ``ZYTE_API_USE_ENV_PROXY`` setting to ``True``.
5 changes: 4 additions & 1 deletion scrapy_zyte_api/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,10 @@ def __init__(
self._param_parser = _ParamParser(crawler)
self._retry_policy = _load_retry_policy(settings)
self._stats = crawler.stats
self._session = create_session(connection_pool_size=self._client.n_conn)
self._session = create_session(
connection_pool_size=self._client.n_conn,
trust_env=settings.getbool("ZYTE_API_USE_ENV_PROXY"),
)
self._must_log_request = settings.getbool("ZYTE_API_LOG_REQUESTS", False)
self._truncate_limit = settings.getint("ZYTE_API_LOG_REQUESTS_TRUNCATE", 64)
if self._truncate_limit < 0:
Expand Down
18 changes: 18 additions & 0 deletions tests/test_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,3 +430,21 @@ def test_log_request_truncate_negative(enabled):
settings=None,
crawler=crawler,
)


@pytest.mark.parametrize("enabled", [True, False, None])
def test_trust_env(enabled):
settings: Dict[str, Any] = {
**SETTINGS,
}
if enabled is not None:
settings["ZYTE_API_USE_ENV_PROXY"] = enabled
else:
enabled = False
crawler = get_crawler(settings_dict=settings)
handler = create_instance(
ScrapyZyteAPIDownloadHandler,
settings=None,
crawler=crawler,
)
assert handler._session._trust_env == enabled

0 comments on commit 28927c0

Please sign in to comment.