Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Table lock: tests run with tables that are read-only by default #5381

Merged
merged 5 commits into from
Oct 8, 2021

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Apr 2, 2021

Issue

Ref #5303.

Description of changes

Lock Table and provide a context manager that opens it.

This is a draft; it will be interesting to see situations in which it fails. Here's one fascinating find: 17d37fe.

Includes
  • Code changes
  • Tests
  • Documentation

@janezd janezd added the needs discussion Core developers need to discuss the issue label Apr 29, 2021
@janezd janezd removed the needs discussion Core developers need to discuss the issue label May 7, 2021
@janezd janezd force-pushed the table-lock branch 3 times, most recently from d912603 to 28e51ef Compare May 8, 2021 10:38
@janezd
Copy link
Contributor Author

janezd commented May 9, 2021

@markotoplak, food for thought: some projection widgets (FreeViz, LinearProjection) output the projection matrix, that is, positions of "anchors" (projected base vectors, variables) in the projection as Table. A widget connected to this would almost always store this table as a reference, not a copy.

The user can move these "anchors", which changes the corresponding Table. The output may or may not be immediately re-sent. Even if sent immediatelly, there is a (very short) lag. Worse still, if the destination widget checks whether the new input data differs from the previousindeed changed, it won't notice it -- because it receives the same instance as before (and doesn't retain a copy of the data to check).

To avoid this problem, a widget must never send a table if it later changes it (place). In practice this means that all output tables must be explicitly constructed for output, without storing it.

@markotoplak
Copy link
Member

@janezd, yes, exactly, as soon as something goes through the signal it should not be changed anymore, or we get potential subtle bugs like this one: a receiving widget's stored table is changed before a signal is emitted.

@janezd janezd force-pushed the table-lock branch 2 times, most recently from aca50e3 to 8e50762 Compare May 15, 2021 09:20
@janezd
Copy link
Contributor Author

janezd commented May 15, 2021

Say that some function calls eliminate_zeros on Table's X. (We may call this function, say, JaccardModel._compute_sparse.) Must it do this on a copy in order to not change the data? Or is a sparse Table not changed by merely eliminating zeros?

In practice, this should be important only if some code stores csr's internal arrays and expects they won't change, or if some code treats zeros differently than missing data.

BTW, unless I've broken something with the last changes, we're down to only 5 failing tests. Yay.

@janezd janezd force-pushed the table-lock branch 2 times, most recently from 03ce796 to 2223f50 Compare May 15, 2021 12:33
@janezd
Copy link
Contributor Author

janezd commented May 20, 2021

The two remaining issues are due to CatGB: its Cython code expects a writeable buffer, though it probably does not actually change it. In our code, we solve this by adding a const specifier (as in def contingency(const double[:] x, int nx, const double[:] y, int ny):). For third-party code, I implemented a Table.force_unlocked, which does what its name says, but shouldn't be used if we're not sure the table won't be modified.

So: does any of these methods (to my knowledge we support several implementations) modify the array?

@janezd janezd force-pushed the table-lock branch 2 times, most recently from ea1d720 to 2508651 Compare May 23, 2021 12:21
@codecov
Copy link

codecov bot commented May 23, 2021

Codecov Report

Merging #5381 (375e1b2) into master (2e4803d) will increase coverage by 0.01%.
The diff coverage is 90.83%.

@@            Coverage Diff             @@
##           master    #5381      +/-   ##
==========================================
+ Coverage   85.96%   85.97%   +0.01%     
==========================================
  Files         313      313              
  Lines       65471    65641     +170     
==========================================
+ Hits        56280    56435     +155     
- Misses       9191     9206      +15     

@janezd
Copy link
Contributor Author

janezd commented May 23, 2021

This now passes the tests. What still needs to be done:

@markotoplak, @lanzagar, could you inspect this before I continue?

@irgolic
Copy link
Member

irgolic commented May 26, 2021

This idea's really creative, and the execution is very thorough. Good job! 👏

However, I believe #5303 could be solved simpler and easier within orange-canvas-core – after all, the problem you are solving stems from Table as a format of serialization between widgets. If widgets are linked in a single chain, it isn't wrong to alter data directly.

Throwing an exception on data modification will break a LOOOOT of code, not just within Orange, but also in scripts that use Orange as a data mining library, and other projects which use Orange Table as their backbone. And their use cases for treating Table as mutable are completely valid.

I'd rather not force people to learn to write with table.unlocked(): if not necessary, I've never seen the with keyword used like this before, do you have an example from a well-known project? I'm afraid it will raise Orange's already steep learning curve, which I tried to flatten by adding DataFrame setters. If it is unavoidable, what's the exception/warning message going to be? Please don't consider this an afterthought, the ability to teach a solution in a one-line warning message is an important characteristic in its appraisal.

@janezd
Copy link
Contributor Author

janezd commented May 27, 2021

I'm OK with issuing a warning, and letting it forever be a warning, not an exception. Though there's this problem that Table.X is made read-only, so some of the new exceptions are raised by numpy. We can think of a way around it.

Furthermore, I'm OK with adding a global flag whether these warnings are enabled or not. Orange canvas would set it and thus show warning (for add-ons; the core code should be clean!). Any other projects can leave the flag unset.

I believe that the problem is not related only to workflows. With any parallelism, changing the table may be problematic.

I realize that this steepens the learning curve. The current exception says that the table is read-only unless unlocked; this should make the user look into documentation.

I don't know of any uses of with that are exactly like this, but this is in line with what with is. It was intended to manage resources in such a way that something is set/opened/changed/... and at the end of the block some code that cleans this up is guaranteed to execute. (with is also presented as replacement for try/finally). One example is opening a file (and the manager from this PR opens a table :), another is mocking in unit tests. So the use of with in this context isn't a particular hack.

@irgolic
Copy link
Member

irgolic commented May 27, 2021

Though there's this problem that Table.X is made read-only, so some of the new exceptions are raised by numpy. We can think of a way around it.

That'd be ideal.

Furthermore, I'm OK with adding a global flag whether these warnings are enabled or not.

This is a great idea :D

The current exception says that the table is read-only unless unlocked; this should make the user look into documentation.

Don't rely on users looking into documentation, it'll make most people give up instead. If this is something that almost anyone who tries using Python Script will encounter, it'd be nice if we could guide them well with an error message. Could we offer an alternative suggestion in the sense of 'you should copy the array or unlock it if you're not using parallelism'?

On that note, I'm also interested in what messages a DataFrame created from a locked numpy array will show (table.X_df).

After a little bit, there's probably going to be a StackOverflow question on the top of Google search results for 'table is read-only unless unlocked', but let's try not to rely on that.

@markotoplak
Copy link
Member

If widgets are linked in a single chain, it isn't wrong to alter data directly.

@irgolic, no, it is still wrong. Imagine that a chain is modifying data, and then a user changes the parameters of a widget in the chain.

@markotoplak
Copy link
Member

@irgolic, I do understand your points and agree that as long as you keep your table local, it is fine to change it.

But if the table is passed through a signal, that changes. Even without any parallelism we will get problems (because a widget in a chain can be reapplied with new settings). I am for irreversibly locking tables that passed through a signal. First as a deprecation warning, of course. That would make it easier for users - no unlocking statements necessary. :)

@irgolic
Copy link
Member

irgolic commented May 27, 2021

no, it is still wrong. Imagine that a chain is modifying data, and then a user changes the parameters of a widget in the chain.

You're right.

I am for irreversibly locking tables that passed through a signal. First as a deprecation warning, of course. That would make it easier for users - no unlocking statements necessary.

I'm seriously against erroring on an unlocked table in Python Script. Throw a warning and briefly explain what they did wrong. Think 'best effort'.
But I'd be down for setting up our CI to fail if a table is unlocked after it's been passed through a signal?

This got me thinking about how links could visibly stale if the table they sent through is ever unlocked, and would require recomputation.

@janezd janezd force-pushed the table-lock branch 4 times, most recently from 408fa64 to 490a9b7 Compare June 11, 2021 14:03
Orange/data/table.py Outdated Show resolved Hide resolved
Orange/data/table.py Outdated Show resolved Hide resolved
@janezd
Copy link
Contributor Author

janezd commented Jun 11, 2021

@markotoplak proposed a very clever solution, which I happily implemented: tables are locked within tests, but not when running Orange (or individual widgets).

Barring any bugs in implementation, locking should have no effect on runtime, but it should show any violations through failed tests. This PR takes care of core Orange, while tests for add-ons will most certainly fail after this is merged.

Implementation details: locking is regulated by Table.LOCKING flag, which is False by default. It is enabled by the three init files for tests (= in CI) and by the module that definesWidgetTest (= when testing widgets locally). Developers can manually enable it for other tests, too. If they don't, problems in their code will be exposed on CI.

@janezd janezd force-pushed the table-lock branch 2 times, most recently from 69dda8a to 7b736f6 Compare June 16, 2021 13:21
@janezd janezd mentioned this pull request Jun 21, 2021
@markotoplak
Copy link
Member

markotoplak commented Oct 1, 2021

Notes to self, TODO for merge:

  • Make sure that table lock can be easily disabled in add-on tests. They can not be rewritten to use the unlocking decorators until a new version of Orange is released.
  • Decide about subarray replacement
  • Decide about the retain shortcut.
  • Check test_owmergedata.
  • Check Orange.data.table.
  • Check test_table.

@markotoplak
Copy link
Member

Hm, at first I wanted to use your subarray shortcut but then test_valueFilter_string_case_insens failed. It failed in the third line:

        d = data.Table("zoo")
        with d.unlocked():
            d[d[:, "name"].metas[:, 0] == "girl", "name"] = "GIrl"

I got ValueError: assignment destination is read-only.

Let's analyze what happened: d[:, "name"] is a selection for which original metas can be perfectly reused, and is thus copied and locked. Because it is the same array, d.metas is locked too, and the assignment fails. So whenever tables share arrays, the locking mechanism can fail.

Here we saw a problem in a single thread, which we can "fix" by not doing the optimization and we get a correct behavior of not allowing to unlock [d[:, "name"] while d is still unlocked. :)

And this reminds me of race conditions: having multiple threads unlocking the same table or an array shared between tables could be an issue. Fortunately, we avoid editing tables anyway so perhaps this is not such an issue, just perhaps some badly written code could start randomly crashing if table locking was enabled. But that code was likely wrong already. So I wouldn't worry about it, or can you, @janezd, think of a problematic case?

I'd avoid the optimization and would go for un-unlockable views, so original semantics in these cases. And I would not worry about multithreading, because if someone has multiple threads editing the same table, they deserve crashes.

Finally, should we also search for all cases when tables could share arrays and try to code around them? I wouldn't worry too much about it, I'd fix it when something starts failing even though it should not.

@janezd
Copy link
Contributor Author

janezd commented Oct 7, 2021

If computation of some table('s column) is separated into two threads, which simultaneously unlock it and set the data, this would fail although it is kind of legit. But the workaround is to unlock the table before starting the threads, so I don't see it as a big problem.

Your example with metas is more disturbing because one table can share an array with another without realizing it, and this can be different to track. Maybe we should just avoid optimizations and always copy?

@janezd
Copy link
Contributor Author

janezd commented Oct 7, 2021

@markotoplak, I'm OK with all your changes. You can tidy this up by squashing them into the original two commits, if you wish.

@markotoplak markotoplak changed the title Table lock [ENH] Table lock: tests run with tables that are read-only by default Oct 8, 2021
@markotoplak markotoplak merged commit ec30f34 into biolab:master Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants