Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race in concurrent uses of Re.exec on one pre-compiled Re.re #287

Open
kit-ty-kate opened this issue May 21, 2024 · 7 comments
Open

Comments

@kit-ty-kate
Copy link
Member

Using TSan with OCaml 5.2 on a code that use re.1.11.0, I'm getting the following error:

WARNING: ThreadSanitizer: data race (pid=76587)
  Read of size 8 at 0xffff60737008 by thread T16 (mutexes: write M0):
    #0 camlRe__Core.loop_1012 lib/core.ml:166 (opamMain.exe+0xbe42e4) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    #1 camlRe__Core.match_str_1142 lib/core.ml:304 (opamMain.exe+0xbe57ec) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    #2 camlRe__Core.exec_2101 lib/core.ml:941 (opamMain.exe+0xbec9f8) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    #3 camlOpamUrl.fun_1957 src/core/opamUrl.ml:70 (opamMain.exe+0xb83a88) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    ...

  Previous write of size 8 at 0xffff60737008 by thread T1 (mutexes: write M1):
    #0 caml_modify runtime/memory.c:220 (opamMain.exe+0xe287fc) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    #1 camlRe__Core.validate_991 lib/core.ml:162 (opamMain.exe+0xbe4108) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    #2 camlRe__Core.loop_1012 lib/core.ml:175 (opamMain.exe+0xbe4454) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    #3 camlRe__Core.match_str_1142 lib/core.ml:304 (opamMain.exe+0xbe57ec) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    #4 camlRe__Core.exec_2101 lib/core.ml:941 (opamMain.exe+0xbec9f8) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    #5 camlOpamUrl.fun_1957 src/core/opamUrl.ml:70 (opamMain.exe+0xb83a88) (BuildId: ae1ab72786b3638d51b987620a76af5acea9e7b6)
    ...

The code in opam that uses Re is the following: https://github.com/ocaml/opam/blob/391333d35bcdc8b55df709b876b8bafcf75f3452/src/core/opamUrl.ml#L46-L74 and can be summarized into:

let f =
  let re = Re.compile ... in
  fun str -> Re.exec re str

which then trigger the data race on this piece of mutable state:

st.next.(color) <- st'

let st' = st.next.(Char.code info.colors.[Char.code s.[pos]]) in

@OlivierNicole
Copy link

Cc @vouillon in case you didn’t get notified and you find this of interest. Looking at this from 20,000 feet, it looks like Re has been designed to be safe for concurrent use with systhreads, but is not currently domain-safe.

@vouillon
Copy link
Member

ocaml-re is not thread-safe at all.

@OlivierNicole
Copy link

I see. So making it thread-safe would at best take some hours of work.

A possible temporary workaround, obviously unsatisfactory on many levels, would be to sequence all calls into the library by protecting them with a global mutex.

@kit-ty-kate
Copy link
Member Author

ocaml-re is not thread-safe at all.

Is the whole library thread-unsafe or is it only specific functions like exec here? My assumption from what I'm seeing so far seems to be that any function processing Re.re is thread-unsafe but once you have Re.t it should be fine to use.

@rgrinberg
Copy link
Member

rgrinberg commented Aug 27, 2024

Constructing Re.t should be thread-safe. The resulting Re.re should only be used by one thread at a time.

@edwintorok
Copy link
Contributor

ocaml-re is not thread-safe at all.

This is news to me, I thought only Re.Str was not thread safe, and the rest of Re was.
I attempted to document that here: #194
(that is the reason we moved from the compiler's stdlib Str, to Re.Posix, because Re.Posix was believed to be thread-safe).

If the entire 're' is not thread safe currently, it would be good to document this so that users of the library can add appropriate mutexes. (Although we've been using re in production for years and haven't noticed issues with it on OCaml 4).

Although it looks like ocaml-re is definitely not domain-safe on OCaml 5.
See mirage/ocaml-uri#178 (comment) and mirage/ocaml-uri#178 (comment) which reproduces corruption in the match positions. Although Re.exec is called from 2 different threads and the resulting value is only used by a single thread, the match positions still somehow end up sharing state across threads.

@vouillon
Copy link
Member

Maybe we could add a way to fully compile a regular expression immediately, rather than lazily. Then, it would be thread-safe to match against this regular expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants