Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nullpointer exception for pekko-instrumentation in version 1.1.0-M1 onward #1352

Open
TjarkoG opened this issue Jul 25, 2024 · 27 comments
Open

Comments

@TjarkoG
Copy link
Contributor

TjarkoG commented Jul 25, 2024

when testing pekko with the version 1.1.0-M1 which will be released soon we've ran into a nullpointer error when running with Kamon + kanela-instrumentation.

to Reproduce:
build.sbt

name := "pekko-quickstart-scala"

version := "1.0"

scalaVersion := "2.13.14"

lazy val pekkoVersion = "1.1.0-M1"

run/fork := true

libraryDependencies ++= Seq(
  "org.apache.pekko" %% "pekko-actor-typed" % pekkoVersion,
  "org.apache.pekko" %% "pekko-http" % pekkoVersion,
  "org.apache.pekko" %% "pekko-stream" % pekkoVersion,
  "ch.qos.logback" % "logback-classic" % "1.5.6",
  "io.kamon" %% "kamon-pekko" % "2.7.3",
  "io.kamon" %% "kamon-system-metrics" % "2.7.3",
  "io.kamon" %% "kamon-prometheus" % "2.7.3"
)

Main.scala:

//#full-example
package com.example

import com.typesafe.config.ConfigFactory
import org.apache.pekko.actor.ActorSystem
import org.apache.pekko.http.scaladsl.Http
import org.apache.pekko.http.scaladsl.client.RequestBuilding.Get
import org.apache.pekko.http.scaladsl.server.Directives

object Main extends Directives {
  def main(args: Array[String]): Unit = {
    val config = ConfigFactory.load()
    implicit val untyped: ActorSystem = ActorSystem.create("Test", config)
    Http()(untyped).singleRequest(Get("https://www.google.com")).onComplete(rsp => println(rsp.map(_.status)))(untyped.dispatcher)
  }
}

when starting with -javaagent:javaagent/kanela-agent.jar (version 1.0.18)

i'm receiving the exception:

org.apache.pekko.actor.ActorInitializationException: pekko://Test/system/IO-TCP/selectors/$a: exception during creation, root cause message: [Cannot invoke "org.apache.pekko.dispatch.DispatcherPrerequisites.settings()" because the return value of "kamon.instrumentation.pekko.instrumentations.DispatcherInfo$HasDispatcherPrerequisites.dispatcherPrerequisites()" is null]
	at org.apache.pekko.actor.ActorCell.create(ActorCell.scala:679)
	at org.apache.pekko.actor.ActorCell.invokeAll$1(ActorCell.scala:523)
	at org.apache.pekko.actor.ActorCell.systemInvoke(ActorCell.scala:545)
	at org.apache.pekko.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:297)
	at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:232)
	at kamon.instrumentation.executor.ExecutorInstrumentation$InstrumentedForkJoinPool$TimingRunnable.run(ExecutorInstrumentation.scala:800)
	at org.apache.pekko.dispatch.ForkJoinExecutorConfigurator$PekkoForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:61)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: java.lang.reflect.InvocationTargetException: null
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	at org.apache.pekko.util.Reflect$.instantiate(Reflect.scala:82)
	at org.apache.pekko.actor.ArgsReflectConstructor.produce(IndirectActorProducer.scala:111)
	at org.apache.pekko.actor.Props.newActor(Props.scala:236)
	at org.apache.pekko.actor.ActorCell.newActor(ActorCell.scala:626)
	at org.apache.pekko.actor.ActorCell.create(ActorCell.scala:653)
	... 11 common frames omitted
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.pekko.dispatch.DispatcherPrerequisites.settings()" because the return value of "kamon.instrumentation.pekko.instrumentations.DispatcherInfo$HasDispatcherPrerequisites.dispatcherPrerequisites()" is null
	at kamon.instrumentation.pekko.instrumentations.InstrumentNewExecutorServiceOnPekko$.around(DispatcherInstrumentation.scala:130)
	at kamon.instrumentation.pekko.instrumentations.InstrumentNewExecutorServiceOnPekko.around(DispatcherInstrumentation.scala)
	at org.apache.pekko.dispatch.ThreadPoolConfig$ThreadPoolExecutorServiceFactory.createExecutorService(ThreadPoolBuilder.scala)
	at org.apache.pekko.dispatch.Dispatcher$LazyExecutorServiceDelegate.executor$lzycompute(Dispatcher.scala:54)
	at org.apache.pekko.dispatch.Dispatcher$LazyExecutorServiceDelegate.executor(Dispatcher.scala:54)
	at org.apache.pekko.dispatch.ExecutorServiceDelegate.execute(ThreadPoolBuilder.scala:232)
	at org.apache.pekko.dispatch.ExecutorServiceDelegate.execute$(ThreadPoolBuilder.scala:232)
	at org.apache.pekko.dispatch.Dispatcher$LazyExecutorServiceDelegate.execute(Dispatcher.scala:53)
	at org.apache.pekko.dispatch.Dispatcher.executeTask(Dispatcher.scala:91)
	at org.apache.pekko.dispatch.MessageDispatcher.unbatchedExecute(AbstractDispatcher.scala:177)
	at org.apache.pekko.dispatch.BatchingExecutor.execute(BatchingExecutor.scala:143)
	at org.apache.pekko.dispatch.BatchingExecutor.execute$(BatchingExecutor.scala:134)
	at org.apache.pekko.dispatch.MessageDispatcher.execute(AbstractDispatcher.scala:107)
	at org.apache.pekko.util.SerializedSuspendableExecutionContext.attach(SerializedSuspendableExecutionContext.scala:92)
	at org.apache.pekko.util.SerializedSuspendableExecutionContext.execute(SerializedSuspendableExecutionContext.scala:95)
	at org.apache.pekko.io.SelectionHandler$ChannelRegistryImpl.<init>(SelectionHandler.scala:200)
	at org.apache.pekko.io.SelectionHandler.<init>(SelectionHandler.scala:324)
	... 21 common frames omitted

i've tried to analyze the error and got to the point that
kamon/instrumentation/pekko/instrumentations/DispatcherInstrumentation.scala:164
does not seem to get triggered with pekko 1.1.0-M1 anymore resulting in dispatcherPrerequisites being null in
kamon/instrumentation/pekko/instrumentations/DispatcherInstrumentation.scala:125
but i cannot understand what change in pekko 1.0.3 <-> 1.1.0-M1 made that difference and how to fix it

@pjfanning
Copy link
Contributor

Pekko 1.1.0-M1 is released. It is a milestone release but the jars are all up on Maven Central.

@pjfanning
Copy link
Contributor

#1354 PR experiment does not seem to pick up this issue. Could be a gap in test coverage. @TjarkoG could you see if you could add a unit test that reproduces this?

@pjfanning
Copy link
Contributor

There are some changes in the Pekko dispatch code in v1.1.0-M1. I wouldn't have thought them major.

https://github.com/apache/pekko/commits/0930982b9a5625e622c4c4c2a6f71dc9917cb336/actor/src/main/scala/org/apache/pekko/dispatch

apache/pekko#485 is the biggest change

@TjarkoG
Copy link
Contributor Author

TjarkoG commented Jul 29, 2024

@pjfanning sry for the late response.
I've looked into it but the whole instrumentation logic is sadly out of my knowledge zone.
Took me way longer to gather the information i put above than im proud to admit :(

@pjfanning
Copy link
Contributor

The strange thing is the #1354 CI tests pass for pekko. Any interest in just building your own kamon jars using the branch underlying #1354 ?

@TjarkoG
Copy link
Contributor Author

TjarkoG commented Jul 29, 2024

i can try if this would help with our problem but im not to optimistic that it will help.
the problems with the "normal" jars are also only occurring when starting with the kanela-agent for instrumentation and when using the dispatcher "pekko.io.pinned-dispatcher" from the reference.conf "pekko.io.tcp"
edit:
i've build the jar locally based on #1354 and tried it with the same result

@TjarkoG
Copy link
Contributor Author

TjarkoG commented Aug 9, 2024

@ivantopo do you have any idea how to create tests for that?

@pjfanning
Copy link
Contributor

I created https://github.com/pjfanning/kamon-pekko-http-test based on the description here and this runs fine with no issues.

@TjarkoG
Copy link
Contributor Author

TjarkoG commented Aug 10, 2024

Did you start it with the kanela java-agent?
Without it does work but we dont get the instrumentation

@pjfanning
Copy link
Contributor

Did you start it with the kanela java-agent? Without it does work but we don't get the instrumentation

Thanks. I hadn't added kanela-agent. The issue happens with it enabled. I updated the sample project. I haven't used Kamon since its early days. I have my own fork that I use myself and it works with Pekko 1.1 (https://github.com/pjfanning/micrometer-pekko).

@pjfanning
Copy link
Contributor

I may have messed up the test run but when I tried Scala 3.3.3 and 2.12.19, it seemed ok. Maybe this only happens with Scala 2.13.

@Philippus
Copy link
Contributor

Philippus commented Sep 5, 2024

I'm getting this issue now with Pekko 1.1.0.

@Philippus
Copy link
Contributor

Philippus commented Sep 5, 2024

I think it's related to inlining. This commit still works: apache/pekko@d829637. This one doesn't: apache/pekko@0f1db53. And if I disable inlining with the pekko.no.inline setting and compile pekko it works.
I'm using Scala 2.13.14.

@pjfanning
Copy link
Contributor

pjfanning commented Sep 5, 2024

I think it's related to inlining. This commit still works: apache/pekko@d829637. This one doesn't: apache/pekko@0f1db53. And if I disable inlining with the pekko.no.inline setting and compile pekko it works. I'm using Scala 2.13.14.

Seems plausible. Do you see the same behaviour as me - in that Scala 3 and Scala 2.12 work fine but that the NPE pops up with Scala 2.13? The Pekko inlining does not affect the Scala 3 release of Pekko.

@Philippus
Copy link
Contributor

I think it's related to inlining. This commit still works: apache/pekko@d829637. This one doesn't: apache/pekko@0f1db53. And if I disable inlining with the pekko.no.inline setting and compile pekko it works. I'm using Scala 2.13.14.

Seems plausible. Do you see the same behaviour as me - in that Scala 3 and Scala 2.12 work fine but that the NPE pops up with Scala 2.13? The Pekko inlining does not affect the Scala 3 release of Pekko.

2.12.19 and 3.5.0 didn't give a NPE.

@Philippus
Copy link
Contributor

Is this a bug in Kamon, or a bug in Pekko ?

@pjfanning
Copy link
Contributor

Pekko 1.1 uses Scala 2 compiler inlining. Kamon doesn't seem to be able to find the inlined code in Pekko.

@leszekgruchala
Copy link

I do have the same issue with 2.13.14

@pjfanning
Copy link
Contributor

pjfanning commented Sep 9, 2024

To be clear, the Apache Pekko 1.1 releases won't be modified to suit Kamon. We reserve the right in minor releases to make some changes.
The issue here is that Kanela works a bit like AOP and expects to be able the manipulate Pekko byte code to gather metrics. It looks like the instrumentation logic in Kamon - specify for Pekko dispatchers - needs to be modified to work with Pekko 1.1.

@mdedetrich just adding to you to the issue just in case you have any thoughts on this.

If anyone has any thoughts on hooks that could be added to Pekko 1.2 or 2.0 that makes it easier for Kamon to instrument Pekko then feel free to raise issues or PRs in the Pekko repo(s).

@pjfanning
Copy link
Contributor

I created #1361 as a workaround.

@pjfanning
Copy link
Contributor

Users hitting this issue should also be able to disable dispatcher instrumentation. See https://github.com/kamon-io/Kamon/blob/124e223de4d7d806b47c76323f5d91edf626fa81/instrumentation/kamon-pekko/src/main/resources/reference.conf and check about excluding dispatchers.

@hughsimpson
Copy link
Contributor

Just catching up on all this, been neglecting it over the summer (sorry about that!). The instrumentation is definitely failing because of the inlining (added some comments about this on a draft pr before I realised you guys had already spotted it 😅). If we can get some instrumentation hooks into pekko I'm sure that would mitigate the issue... I will have a poke about with what might work on the pekko side and see if we can find a place of agreement. Almost all the instrumentation should work for scala 2.12, all of it should work for scala 3, so it's really only 2.13 that suffers much as it currently stands; one fix is, thus, to upgrade to scala 3 (although that's clearly not always possible). From my experiments so far, I don't think it's possible to fix this purely on the Kamon side.

@mdedetrich
Copy link

I defintiely think that instrumentation hooks is the way to go. While inlining may have caused the issue, as @pjfanning pointed out Pekko reserves the right to change the structure of stack/methods as it pleases as long as the methods are not public and so the current implementation of kamon does appear to be brittle for this reason.

If you have some ideas about instrumentation, PR's/design documents are welcome! Supporting this officially via an API is the way to go.

@pjfanning
Copy link
Contributor

I debugged a broken test in https://github.com/pjfanning/micrometer-pekko (uses AOP and has similar issues to Kamon) and in this once case that calls to ActorCell.stop were being inlined on one particular call path (Scala 2.13 only). The call path related to stopping a RobinRobinRouter instance that had 5 actor refs to round robin over. What I couldn't work out is why the ActorCell.stop got inlined because it is not marked with @inline annotation. I decompiled some classes and output some stack traces but these made it look like no inlining occurred but the instrumentation is definitely missing the ActorCell.stop call on this call path. The test case works fine with Scala 2.12 and with Pekko 1.0 (Scala 2.12 and Scala 2.13).

@pjfanning
Copy link
Contributor

pjfanning commented Sep 17, 2024

@mdedetrich is there any way to mark a function so it can't be inlined?

Edit - apparently there is a @noinline annotation. https://www.baeldung.com/scala/inline-noinline-annotations

@pjfanning
Copy link
Contributor

I raised apache/pekko#1484 - no guarantees that the community will agree to make changes but it is open for discussion.

@hughsimpson
Copy link
Contributor

Thanks @pjfanning ! I'll see if I can get a pekko build that passes the suite with a sprinkle of those annotations 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants