Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not only comments are scanned #429

Closed
hoijui opened this issue Oct 22, 2021 · 8 comments
Closed

Not only comments are scanned #429

hoijui opened this issue Oct 22, 2021 · 8 comments
Labels
enhancement New feature or request

Comments

@hoijui
Copy link
Contributor

hoijui commented Oct 22, 2021

The reuse tool fails to scan the following BASH file because of the echo "SPDX-... lines,
no matter whether the file is called tst, tst.sh or tst.bash.

#!/usr/bin/env bash

# SPDX-FileCopyrightText: 2021 John Doe
#
# SPDX-License-Identifier: CC0-1.0

AUTHOR="John Doe"
LICENSE="CC0-1.0"

{
	echo "<!--"
	echo "SPDX-FileCopyrightText: $(date +'%Y') $AUTHOR"
	echo
	echo "SPDX-License-Identifier: $LICENSE"
	echo "-->"
	echo
	echo "# My Project"
} > /dev/null # In reality there would be a file-name here,
              # but we are not interested in the output of this script,
              # only in the effect of scanning this script with the reuse tool

As the prefix to the trouble-some lines is echo ",
which is not a valid comment line in any format,
I imagine it should be technically possible to make this work,
at least in theory.

@hoijui
Copy link
Contributor Author

hoijui commented Oct 24, 2021

I nailed it down to the function def extract_spdx_info(text)
at https://github.com/fsfe/reuse-tool/blob/master/src/reuse/_util.py#L176.

It takes a text as argument (usually the whole file content) and scans for the SPDX identifiers in the whole text.

To fix this issue, either this function would have to be given only the text within comments, or it would have to be given the comment style as argument too, and then make sure it only considers SPDX identifiers found in actual comments.

What do you think?
I could make a PR once we could agree on the way to handle this.

@mxmehl
Copy link
Member

mxmehl commented Oct 28, 2021

Yeah, we've had this problem a couple of times. There are some ugly tricks to circumvent, e.g.:

expected = cleandoc(
"""
# Hello, world!
#
# spdx-FileCopyrightText: Mary Sue
#
# spdx-License-Identifier: GPL-3.0-or-later
"""
).replace("spdx", "SPDX")

or

spdx_info = SpdxInfo(
{"GPL-3.0-or-later"}, {"SPDX" "-FileCopyrightText: Mary Sue"}
)

Obviously, that is not satisfying. Therefore, in fsfe/reuse-docs#70 we try to define solutions. E.g. have a look at option 3 which is currently the most preferred. A .licence file would overwrite the file content and thereby get rid of any false-positive findings or even errors.

Your solution would be nice as well. However, can we be sure to avoid false-positives and especially false-negatives? The REUSE tool defines recommended comment syntaxes for file types, but for some languages there may be multiple options. E.g. take this bad example of a PHP file:

<html>
  <!-- SPDX-License-Identifier: MIT -->
  <head><title>My website</title></head>
  <body>
    <?php
      /* SPDX-License-Identifier: Apache-2.0 */
      // SPDX-FileCopyrightText: foobar
      echo date("Ymd");
    ?>
  </body>
</html>

We have multiple comments styles here: PHP multiline, PHP singleline, and HTML. Ideas and input welcome how to tackle this!

@mxmehl mxmehl added the enhancement New feature or request label Nov 3, 2021
@Potherca
Copy link

Regarding PHP, I'd say: start supporting the most common use-cases first and incrementally add more if/when people start complaining.

I think HTML based PHP templates should probably be mostly left alone, as the developer will have to decide for themselves if they want HTML or PHP comment regarding their copyright...

The most common use-case of modern PHP would be class files. I'd suggest a file-level doc-block should suffice:

<?php 
/**
 * SPDX-FileCopyrightText: <year> <author>
 * SPDX-License-Identifier: <license>
 */

That would also make it possible to re-use/extend such a doc-block with other PhpDocumentor tags.

@mxmehl
Copy link
Member

mxmehl commented Jan 22, 2022

Would be solved by #463

@linozen
Copy link
Member

linozen commented May 16, 2022

Would be solved by #463

@mxmehl Now, that #463 is merged, can this be closed here as well?

@mxmehl
Copy link
Member

mxmehl commented May 16, 2022

Absolutely, that is the next best solution for these kinds of issues.

@mxmehl mxmehl closed this as completed May 16, 2022
@hoijui
Copy link
Contributor Author

hoijui commented May 25, 2022

Users running into the problem like I do here,
will not know about this solution,
except they stumble upon this issue or other docu.
That is not very likely, so... how can we change that?

@floriansnow
Copy link
Contributor

@hoijui It's in the official documentation: https://reuse.readthedocs.io/en/stable/usage.html#ignoring-parts-of-a-file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants