Skip to content

Commit

Permalink
Merge pull request #84 from Hynn01/main
Browse files Browse the repository at this point in the history
Rename data-leakage checker and Update randomness-control checkers
  • Loading branch information
Hynn01 authored May 27, 2022
2 parents 803575f + 2e30dcf commit b64433a
Show file tree
Hide file tree
Showing 16 changed files with 153 additions and 54 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,15 @@ hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,\
deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,\
randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,\
missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,\
forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,\
forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,\
dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch \
--output-format=json:report.json,text:report.txt,colorized \
--output-format=text:report.txt,colorized \
--reports=y \
<path_to_sources>
```
[For Windows Users]:
```
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=json:report.json,text:report.txt,colorized --reports=y <path_to_sources>
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=text:report.txt,colorized --reports=y <path_to_sources>
```
Or place a [`.pylintrc` configuration file](https://github.com/Hynn01/dslinter/blob/main/docs/pylint-configuration-examples/pylintrc-with-only-dslinter-settings/.pylintrc) which contains above settings in the folder where you run your command on, and run:
```
Expand Down Expand Up @@ -141,7 +141,7 @@ poetry run pytest .

- **W5517 | gradient-clear-pytorch | Gradient Clear Checker(PyTorch)**: The loss_fn.backward() and optimizer.step() should be used together with optimizer.zero_grad(). If the `.zero_grad()` is missing in the code, the rule is violated.

- **W5518 | data-leakage-scikitlearn | Data Leakage Checker(ScikitLearn)**: All scikit-learn estimators should be used inside Pipelines, to prevent data leakage between training and test data.
- **W5518 | pipeline-not-used-scikitlearn | Pipeline Checker(ScikitLearn)**: All scikit-learn estimators should be used inside Pipelines, to prevent data leakage between training and test data.

- **W5519 | dependent-threshold-scikitlearn | Dependent Threshold Checker(TensorFlow)**: If threshold-dependent evaluation(e.g., f-score) is used in the code, check whether threshold-indenpendent evaluation(e.g., auc) metrics is also used in the code.

Expand Down
12 changes: 6 additions & 6 deletions STEPS_TO_FOLLOW.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,\
deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,\
randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,\
missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,\
forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,\
forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,\
dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch \
--output-format=json:report.json,text:report.txt,colorized \
--output-format=text:report.txt,colorized \
--reports=y \
<path_to_the_project>
```
[For Windows Users]:
```
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=json:report.json,text:report.txt,colorized --reports=y <path_to_sources>
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=text:report.txt,colorized --reports=y <path_to_sources>
```

## For Notebook:
Expand Down Expand Up @@ -67,13 +67,13 @@ hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,\
deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,\
randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,\
missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,\
forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,\
forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,\
dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch \
--output-format=json:report.json,text:report.txt,colorized \
--output-format=text:report.txt,colorized \
--reports=y \
<path_to_the_python_file>
```
[For Windows Users]:
```
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=json:report.json,text:report.txt,colorized --reports=y <path_to_the_python_file>
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,pipeline-not-used-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=text:report.txt,colorized --reports=y <path_to_the_python_file>
```
19 changes: 15 additions & 4 deletions dslinter/checkers/deterministic_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,15 @@ def visit_module(self, module: astroid.Module):
if _import_pytorch is False:
_import_pytorch = has_import(node, "torch")

if isinstance(node, astroid.nodes.Expr) and hasattr(node, "value"):
call_node = node.value
if isinstance(node, astroid.nodes.Expr):
if _has_deterministic_algorithm_option is False:
_has_deterministic_algorithm_option = self._check_deterministic_algorithm_option(call_node)
_has_deterministic_algorithm_option = self._check_deterministic_algorithm_option_in_expr_node(node)

if isinstance(node, astroid.nodes.FunctionDef):
for nod in node.body:
if isinstance(nod, astroid.nodes.Expr):
if _has_deterministic_algorithm_option is False:
_has_deterministic_algorithm_option = self._check_deterministic_algorithm_option_in_expr_node(nod)

# check if the rules are violated
if(
Expand All @@ -70,7 +75,13 @@ def visit_module(self, module: astroid.Module):
ExceptionHandler.handle(self, module)

@staticmethod
def _check_deterministic_algorithm_option(call_node: astroid.Call):
def _check_deterministic_algorithm_option_in_expr_node(expr_node: astroid.Expr):
if hasattr(expr_node, "value"):
call_node = expr_node.value
return DeterministicAlgorithmChecker._check_deterministic_algorithm_option_in_call_node(call_node)

@staticmethod
def _check_deterministic_algorithm_option_in_call_node(call_node: astroid.Call):
# if torch.use_deterministic_algorithm() is call and the argument is True,
# set _has_deterministic_algorithm_option to True
if(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@
from dslinter.utils.resources import Resources


class DataLeakageScikitLearnChecker(BaseChecker):
class PipelineScikitLearnChecker(BaseChecker):
"""Checker which checks rules for preventing data leakage between training and test data."""

__implements__ = IAstroidChecker

name = "data-leakage-scikitlearn"
name = "pipeline-not-used-scikitlearn"
priority = -1
msgs = {
"W5518": (
"There are both preprocessing and estimation operations in the code, but they are not used in a pipeline.",
"data-leakage-scikitlearn",
"pipeline-not-used-scikitlearn",
"Scikit-learn preprocessors and estimators should be used inside pipelines, to prevent data leakage between training and test data.",
),
}
Expand Down Expand Up @@ -84,7 +84,7 @@ def visit_call(self, call_node: astroid.Call):
if self._expr_is_preprocessor(value.func.expr):
has_preprocessing_function = True
if has_learning_function is True and has_preprocessing_function is True:
self.add_message("data-leakage-scikitlearn", node=call_node)
self.add_message("pipeline-not-used-scikitlearn", node=call_node)

except: # pylint: disable=bare-except
ExceptionHandler.handle(self, call_node)
Expand All @@ -98,14 +98,14 @@ def _expr_is_estimator(expr: astroid.node_classes.NodeNG) -> bool:
:return: True when the expression is an estimator.
"""
if isinstance(expr, astroid.Call) \
and DataLeakageScikitLearnChecker._call_initiates_estimator(expr):
and PipelineScikitLearnChecker._call_initiates_estimator(expr):
return True

# If expr is a Name, check whether that name is assigned to an estimator.
if isinstance(expr, astroid.Name):
values = AssignUtil.assignment_values(expr)
for value in values:
if DataLeakageScikitLearnChecker._expr_is_estimator(value):
if PipelineScikitLearnChecker._expr_is_estimator(value):
return True
return False

Expand All @@ -120,7 +120,7 @@ def _call_initiates_estimator(call: astroid.Call) -> bool:
return (
call.func is not None
and hasattr(call.func, "name")
and call.func.name in DataLeakageScikitLearnChecker._get_estimator_classes()
and call.func.name in PipelineScikitLearnChecker._get_estimator_classes()
)

@staticmethod
Expand Down
19 changes: 15 additions & 4 deletions dslinter/checkers/randomness_control_numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,15 @@ def visit_module(self, module: astroid.Module):
if _import_ml_libraries is False:
_import_ml_libraries = has_importfrom_sklearn(node)

if isinstance(node, astroid.nodes.Expr) and hasattr(node, "value"):
call_node = node.value
if isinstance(node, astroid.nodes.Expr):
if _has_numpy_manual_seed is False:
_has_numpy_manual_seed = self._check_numpy_manual_seed(call_node)
_has_numpy_manual_seed = self._check_numpy_manual_seed_in_expr_node(node)

if isinstance(node, astroid.nodes.FunctionDef):
for nod in node.body:
if isinstance(nod, astroid.nodes.Expr):
if _has_numpy_manual_seed is False:
_has_numpy_manual_seed = self._check_numpy_manual_seed_in_expr_node(nod)

# check if the rules are violated
if(
Expand All @@ -76,7 +81,13 @@ def visit_module(self, module: astroid.Module):
ExceptionHandler.handle(self, module)

@staticmethod
def _check_numpy_manual_seed(call_node: astroid.Call):
def _check_numpy_manual_seed_in_expr_node(expr_node: astroid.Expr):
if hasattr(expr_node, "value"):
call_node = expr_node.value
return RandomnessControlNumpyChecker._check_numpy_manual_seed_in_call_node(call_node)

@staticmethod
def _check_numpy_manual_seed_in_call_node(call_node: astroid.Call):
if(
hasattr(call_node, "func")
and hasattr(call_node.func, "attrname")
Expand Down
19 changes: 15 additions & 4 deletions dslinter/checkers/randomness_control_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,15 @@ def visit_module(self, module: astroid.Module):
if _import_pytorch is False:
_import_pytorch = has_import(node, "torch")

if isinstance(node, astroid.nodes.Expr) and hasattr(node, "value"):
call_node = node.value
if isinstance(node, astroid.nodes.Expr):
if _has_pytorch_manual_seed is False:
_has_pytorch_manual_seed = self._check_pytorch_manual_seed(call_node)
_has_pytorch_manual_seed = self._check_pytorch_manual_seed_in_expr_node(node)

if isinstance(node, astroid.nodes.FunctionDef):
for nod in node.body:
if isinstance(nod, astroid.nodes.Expr):
if _has_pytorch_manual_seed is False:
_has_pytorch_manual_seed = self._check_pytorch_manual_seed_in_expr_node(nod)

# check if the rules are violated
if(
Expand All @@ -68,7 +73,13 @@ def visit_module(self, module: astroid.Module):
ExceptionHandler.handle(self, module)

@staticmethod
def _check_pytorch_manual_seed(call_node: astroid.Call):
def _check_pytorch_manual_seed_in_expr_node(expr_node: astroid.Expr):
if hasattr(expr_node, "value"):
call_node = expr_node.value
return RandomnessControlPytorchChecker._check_pytorch_manual_seed_in_call_node(call_node)

@staticmethod
def _check_pytorch_manual_seed_in_call_node(call_node: astroid.Call):
if(
hasattr(call_node, "func")
and hasattr(call_node.func, "attrname")
Expand Down
19 changes: 15 additions & 4 deletions dslinter/checkers/randomness_control_tensorflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,15 @@ def visit_module(self, module: astroid.Module):
if _import_tensorflow is False:
_import_tensorflow = has_import(node, "tensorflow")

if isinstance(node, astroid.nodes.Expr) and hasattr(node, "value"):
call_node = node.value
if isinstance(node, astroid.nodes.Expr):
if _has_tensorflow_manual_seed is False:
_has_tensorflow_manual_seed = self._check_tensorflow_manual_seed(call_node)
_has_tensorflow_manual_seed = self._check_tensorflow_manual_seed_in_expr_node(node)

if isinstance(node, astroid.nodes.FunctionDef):
for nod in node.body:
if isinstance(nod, astroid.nodes.Expr):
if _has_tensorflow_manual_seed is False:
_has_tensorflow_manual_seed = self._check_tensorflow_manual_seed_in_expr_node(nod)

# check if the rules are violated
if(
Expand All @@ -68,7 +73,13 @@ def visit_module(self, module: astroid.Module):
ExceptionHandler.handle(self, module)

@staticmethod
def _check_tensorflow_manual_seed(call_node: astroid.Call):
def _check_tensorflow_manual_seed_in_expr_node(expr_node: astroid.Expr):
if hasattr(expr_node, "value"):
call_node = expr_node.value
return RandomnessControlTensorflowChecker._check_tensorflow_manual_seed_in_call_node(call_node)

@staticmethod
def _check_tensorflow_manual_seed_in_call_node(call_node: astroid.Call):
if(
hasattr(call_node, "func")
and hasattr(call_node.func, "attrname")
Expand Down
4 changes: 2 additions & 2 deletions dslinter/plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
from dslinter.checkers.unnecessary_iteration_pandas import UnnecessaryIterationPandasChecker
from dslinter.checkers.unnecessary_iteration_tensorflow import UnnecessaryIterationTensorflowChecker
from dslinter.checkers.deterministic_pytorch import DeterministicAlgorithmChecker
from dslinter.checkers.data_leakage_scikitlearn import DataLeakageScikitLearnChecker
from dslinter.checkers.pipeline_scikitlearn import PipelineScikitLearnChecker
from dslinter.checkers.hyperparameters_pytorch import HyperparameterPyTorchChecker
from dslinter.checkers.hyperparameters_tensorflow import HyperparameterTensorflowChecker
# pylint: disable = line-too-long
Expand Down Expand Up @@ -58,7 +58,7 @@ def register(linter):
linter.register_checker(RandomnessControlDataloaderPytorchChecker(linter))
linter.register_checker(RandomnessControlTensorflowChecker(linter))
linter.register_checker(RandomnessControlNumpyChecker(linter))
linter.register_checker(DataLeakageScikitLearnChecker(linter))
linter.register_checker(PipelineScikitLearnChecker(linter))
linter.register_checker(DependentThresholdPytorchChecker(linter))
linter.register_checker(DependentThresholdTensorflowChecker(linter))
linter.register_checker(DependentThresholdScikitLearnChecker(linter))
Expand Down
Loading

0 comments on commit b64433a

Please sign in to comment.