Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set test cell score tag using variable in test cell #1889

Open
Adamtaranto opened this issue Jun 8, 2024 · 6 comments
Open

Set test cell score tag using variable in test cell #1889

Adamtaranto opened this issue Jun 8, 2024 · 6 comments

Comments

@Adamtaranto
Copy link

I want to modify the autograder code to check for a specific variable "total_score" in a test cell and use its value to set the score metadata tag for that test cell.

I am running lots of try/except blocks in my test cell and tallying the total score at the end. If any single test fails I have to manually check the tally for the cell and fill in the score for that cell. I want to automate this process.

Can someone please point me to the autograder code that:

  • pulls in hidden tests
  • runs the test cells
  • Updates the metadata tags on the cell of the notebook that is output to the autograded dir
  • Updates the test cell score in the gradebook.db

Also, what are the tags that correspond to:

  • Total possible score for the cell
  • Autograded points
  • manually entered points
  • extra credit points

Thanks!

@Adamtaranto
Copy link
Author

Adamtaranto commented Jun 14, 2024

Do you think this would be a generally useful feature? @brichet @jhamrick

I can develop it myself, just need some help understanding the tags and the existing autograder code.

@Adamtaranto
Copy link
Author

Is anyone available to help with this?

@shreve
Copy link
Contributor

shreve commented Jul 3, 2024

This already exists. If the test cell outputs text/plain that can be parsed as a float, that will be used as the score for the cell.

def get_partial_grade(output, max_points, log=None):
"""
Calculates partial grade for a cell, based on contents of
output["data"]["text/plain"]. Returns a value between 0
and max_points. Returns max_points (and a warning) for edge cases.
"""

nbgrader/nbgrader/utils.py

Lines 142 to 155 in e0b2e65

elif cell.cell_type == 'code':
# for code cells, we look at the output. There are three options:
# 1. output contains an error (no credit);
# 2. output is a value greater than 0 (partial credit);
# 3. output is something else, or nothing (full credit).
for output in cell.outputs:
# option 1: error, return 0
if output.output_type == 'error' or output.output_type == "stream" and output.name == "stderr":
return 0, max_points
# if not error, then check for option 2, partial credit
if output.output_type == 'execute_result':
# is there a single result that can be cast to a float?
partial_grade = get_partial_grade(output, max_points, log)
return partial_grade, max_points

auto_score, _ = utils.determine_grade(cell, self.log)

@brichet
Copy link
Contributor

brichet commented Jul 5, 2024

Thanks @shreve for providing a possible answer.

@Adamtaranto I thought I already asked (sorry, look like I never pressed the button), but I was wondering if you could provide a example of your use case. Actually, if @shreve's answer fixes your issue it's not necessary.

@Adamtaranto
Copy link
Author

@shreve's answer gets me part of the way to a solution, thanks.

I will need a way to tag the value I want used as the partial score in cases where there is a lot of other text output.

Here is an example of how I am running multiple tests in a autograder test cell. Most are within try/except blocks so that a failed test doesn't stop subsequent tests from running. I'm outputting comments for the feedback reports to help students figure out why they lost marks.

At the end I print the total marks and run an assert statement to check if total == max possible marks.

If the final assert fails then the grader has to check the total marks value and enter it as a manual grade.

# Test student function "translate_orfs()"

### --- Test case 1: Seq < 3 bases
# return empty list if < 3 bases
try:
    test_1_output = "NO_OUTPUT" # Default value in case function dies with error
    test_1b_output = "NO_OUTPUT"
    test_1_output = translate_orfs(SeqRecord(Seq(''), 'empty_seq_object',), 2, 1)
    test_1b_output = translate_orfs(SeqRecord(Seq('AA'), 'empty_seq_object',), 2, 1)
    # If function runs, assess for correct output
    assert test_1_output == [] and test_1b_output == []
    # If output is correct assign marks for test case
    test_case_1 = 1
except:
    # If function fails or gives incorrect result set test case marks to 0
    test_case_1  = 0
    # Error summary
    print(f"Failed test 1: Empty Seq has len < 3 , student output was {test_1_output} and {test_1b_output}, not empty list.")
    
### --- Test case 2: Test for no ORFs
# Should return an empty list
try:
    test_2_output = "NO_OUTPUT" # Default value in case function dies with error
    test_2_output = translate_orfs(SeqRecord(Seq('TTTTTTTTTT'), 'no_orfs'), 2, 1)
    # If function runs, assess for correct output
    assert test_2_output == []
    # If output is correct assign marks for test case
    test_case_2 = 1
except:
    # If function fails or gives incorrect result set test case marks to 0
    test_case_2  = 0
    # Error summary
    print(f"Failed test 2: Test for no ORFs, student score was {test_2_output}, not empty list.")
    
### --- Test case 3: No input
# Return None if no input
try:
    test_3_output = "NO_OUTPUT" # Default value in case function dies with error
    test_3_output = translate_orfs('', 2, 1)
    # If function runs, assess for correct output
    assert test_3_output == None
    # If output is correct assign marks for test case
    test_case_3 = 1
except:
    # If function fails or gives incorrect result set test case marks to 0
    test_case_3  = 0
    # Error summary
    print(f"Failed test 3: Return None if no input seq, student score was {test_3_output}, not None.")
    
#############

# Tally score for quick visual check.
# Use this to fill in the final score for the test cell.
print(f"Total score: {test_case_1 + test_case_2 + test_case_3}")

# Final assert causes the test cell to log 0 marks if any test fails
assert test_case_1 + test_case_2 + test_case_3 == 3, 'Some tests failed.'

I was thinking of having a decorator function in nbgrader that checks for the value of a specific variable to set the partial grade:

@set_metadata_score(set_from_var='score')
def evaluate_student_answer():
    student_answer = 42
    correct_answer = 42
    
    # Evaluate the student's answer
    if student_answer == correct_answer:
        score = 10  # Full marks
    else:
        score = 0  # No marks
    
    return score

# Execute the function to set the metadata
evaluate_student_answer()

or maybe have the decorator function set a partial grade meta tag for the cell and then have saveautogrades.py check for the tag later.

@shreve
Copy link
Contributor

shreve commented Jul 8, 2024

A couple things here.

I need to clarify, the existing partial is when both output_type == "execute_result" and mimetype == "text/plain". This is specifically the case for implicit returns at the end of a cell. Nothing print()ed will be considered for the score. In the following notebook example, 1 would be parsed as the score:

# Input cell
[1]: print("0")
     1
     0
# Execute result
[1]: 1

So your simple example would actually work without the decorator:

# Input cell
[1]:  def evaluate_student_answer():
          student_answer = 42
          correct_answer = 42
          
          print("Noisy computing")

          # Evaluate the student's answer
          if student_answer == correct_answer:
              score = 10  # Full marks
          else:
              score = 0  # No marks
          
          return score
      
      evaluate_student_answer()
      Noisy computing
# Execute result
[1]: 10

However, there is a path forward if you're set on the metadata option.

Nbgrader operates at the Jupyter client level, not inside the kernel, so it doesn't know anything about your variables inside the notebook. It sends the notebook to the kernel and determines a grade based on the output. When you're grading student code inside the kernel, you can send custom metadata back to the client via the display_pub, which is meant to display dynamic responses in the Jupyter notebook context.

For example, if your grading code looked like this:

from IPython import InteractiveShell

def report_score(score):
    InteractiveShell.instance().display_pub.publish(
        # This first data parameter could be an empty dict if you don't want to display anything
        {
            "text/plain": f"Your score is {score}",
            "text/html": f"Your score is <b>{score}</b>"
        },
        metadata={"score": score}
    )

def evaluate_student_answer():
    student_answer = 42
    correct_answer = 42
    
    # Evaluate the student's answer
    if student_answer == correct_answer:
        score = 10  # Full marks
    else:
        score = 0  # No marks
    
    report_score(score)

# Execute the function to set the metadata
evaluate_student_answer()

After the ExecutePreprocessor runs, the student notebook will have a cell with some outputs that look like the following:

  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "893ff932-3c10-46ce-89d9-8c390f8b1aab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "Your score is <b>10</b>"
      ],
      "text/plain": [
       "Your score is 10"
      ]
     },
     "metadata": {
      "score": 10
     },
     "output_type": "display_data"
    }
   ],
   "source": [...]
}

The determine_grade function is already looking at cell outputs. You could update it to look for that metadata.score or something similar. To be clear, this is basically what is already happening with the execute_result.

This is really rough and would need some shaping before pushing upstream into nbgrader, but this is enough to build a proof of concept for yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants