Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crond job successful execute a few times, but stuck without any error log later #88

Open
szmengran opened this issue Oct 29, 2021 · 8 comments
Labels
bug Something isn't working int-shortlist

Comments

@szmengran
Copy link

szmengran commented Oct 29, 2021

config like this:

input {
  jdbc {
    jdbc_connection_string => "jdbc:mysql://xxx/xxx"
    jdbc_user => "xxx"
    jdbc_password => "xxx"
    jdbc_validate_connection => true
    jdbc_driver_library => "./logstash-core/lib/jars/mysql-connector-java.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    schedule => "*/1 * * * * Asia/Shanghai"
    statement => "select * from memberbaseinfo a where lastlogintime > :sql_last_value"
    clean_run => false
    last_run_metadata_path => "./pipeline/.logstash_jdbc_last_run"
    plugin_timezone => "local"
    jdbc_default_timezone => "Asia/Shanghai"
  }
}
output {
  elasticsearch {
    hosts => ["xxx:9200"]
    action=>"update"
    index => "tags"
    doc_as_upsert => true
    document_id => "%{memberid}"
  }
  stdout {
    codec => json_lines
  }
}

The results are as follows:

[2021-10-28T18:51:08,490][INFO ][logstash.inputs.jdbc     ] (0.807486s) select a.memberid, a.nickname, a.sex, a.birthday, a.marriage, a.workcity, a.salary, a.education, ((1<<9 & health)>>9) ^ if(membertype>5, 2, 0) type, lastlogintime from memberbaseinfo a where lastlogintime > '2021-10-28 18:50:01'
{"@version":"1","birthday":"1993-12-24T14:57:02.000Z","nickname":"xx","workcity":10126099,"type":0,"@timestamp":"2021-10-28T10:51:08.495Z","sex":0,"education":5,"salary":5,"marriage":1,"lastlogintime":"2022-02-02T03:59:33.000Z","memberid":1828694900}
[2021-10-28T18:52:02,739][INFO ][logstash.inputs.jdbc     ] (0.363404s) select a.memberid, a.nickname, a.sex, a.birthday, a.marriage, a.workcity, a.salary, a.education, ((1<<9 & health)>>9) ^ if(membertype>5, 2, 0) type, lastlogintime from memberbaseinfo a where lastlogintime > '2021-10-28 18:51:07'
{"@version":"1","birthday":"1993-12-24T14:57:02.000Z","nickname":"ss","workcity":10126099,"type":0,"@timestamp":"2021-10-28T10:52:02.747Z","sex":0,"education":5,"salary":5,"marriage":1,"lastlogintime":"2022-02-02T03:59:33.000Z","memberid":1828694900}
[2021-10-28T18:53:05,752][INFO ][logstash.inputs.jdbc     ] (1.157730s) select a.memberid, a.nickname, a.sex, a.birthday, a.marriage, a.workcity, a.salary, a.education, ((1<<9 & health)>>9) ^ if(membertype>5, 2, 0) type, lastlogintime from memberbaseinfo a where lastlogintime > '2021-10-28 18:52:02'
{"@version":"1","birthday":"1993-12-24T14:57:02.000Z","nickname":"ss","workcity":10126099,"type":0,"@timestamp":"2021-10-28T10:53:05.767Z","sex":0,"education":5,"salary":5,"marriage":1,"lastlogintime":"2022-02-02T03:59:33.000Z","memberid":1828694900}

next job should execute at 2021-10-28T18:54:05,752, but always stuck without eny error log.

@szmengran szmengran added the bug Something isn't working label Oct 29, 2021
@jsvd
Copy link
Member

jsvd commented Nov 9, 2021

I assume you're using a logstash-integration-jdbc >=5.0.1? Any chance you could downgrade to 5.0.1 with:

bin/logstash-plugin install --version 5.0.1 logstash-integration-jdbc

And see if the issue still occurs?

@kares
Copy link
Contributor

kares commented Dec 27, 2021

for the record we've seen a similar issue (using multiple pipelines/inputs), downgrading to --version 5.0.1 does not help.

@kares
Copy link
Contributor

kares commented Feb 10, 2022

@szmengran we're trying to reproduce scenarios such as yours but could use details if you still have some for us.

where there any other pipelines involved in your LS installation besides the configuration you shared?

next job should execute at 2021-10-28T18:54:05,752, but always stuck without eny error log.

how long did you wait before restarting LS, did it never recover in after several minutes?

also knowing your LS version ...

@alromos
Copy link

alromos commented Sep 30, 2022

It looks like we are facing a similar issue in our project.
We have 25 logstash pipelines utilizing JDBC input plugin.
All the pipelines have similar schedule configuration, something like this:

schedule => "30 * * * * *"

Each pipeline have its own value instead of 30.

The pipelines may execute without any issues for a long time, however sometimes some random pipeline out of those 25 suddenly stops its prepared statement execution.
For instance, we see the following logs for some time:

2022-09-29T14:00:00.000Z (0.2s) EXECUTE prepared-statement-1; [2022-09-29 12:00:20 UTC]
2022-09-29T14:01:00.000Z (0.2s) EXECUTE prepared-statement-1; [2022-09-29 12:00:20 UTC]
2022-09-29T14:02:00.000Z (0.2s) EXECUTE prepared-statement-1; [2022-09-29 12:00:20 UTC]

and at some point the logs don't show up anymore.

The pipeline doesn't recover and we need to re-start logstash instance in order to fix inconsistency between our DB and elasticsearch. Sometimes pipeline stays in the invalid state for a day or two when there are no new records in DB and there are no inconsistencies.

Recently we've updated our Logstash docker image to the latest version available - 8.4.2, however the issue still persists without any errors in logs.

@kares have you managed to resolve production issues you've mentioned in the logstash meta issue: improving scheduler performance and reliability? It looks like we have the similar configuration to the one listed in the issue:

multiple schedulers (~20) started from multiple pipelines

If you managed to resolve the issues, does logstash require some additional configuration (usage of some specific rufus scheduler version, for instance) or does it work as expected out of the box?

@alromos
Copy link

alromos commented Nov 15, 2022

@andsel @yaauie
Hi, I can see you took part in code reviews of scheduler changes by @kares.
Do you have any ideas why the issue I've described above may be happening?
Thank you in advance!

@kares
Copy link
Contributor

kares commented Nov 15, 2022

there might be an update from the LS team, the last resolution I was involved with ended up NOT being scheduler gem dependent but the user ended up having long query response times (e.g. due swapping). as a potential "improvement" I would recommend that you try spacing out your pipelines (using the cron expression) so they do not trigger JDBC execution at the exact same second.

that's all I have - please do not ping me directly as it's been some time and most there is to the issue from my perspective is out in the comments - folks are watching the issue so you'll get help if anyone has anything to add,
unfortunately, the direct ping gets around the unsubscribe. good luck!

@andsel
Copy link
Contributor

andsel commented Nov 21, 2022

Logstash v8.3.0 ships the logstash-integration-jdbc v5.3.0 which carries all the improvement on the scheduling side, in particular include the logstash-mixin-scheduler:v1.0 and rufus v3.0.9. The actual version of Rufus is 3.8 which changed a lot the API and probably has a lot of fixes, but completely breaks the compatibility at code level as I remember.

I would follow the suggestion from Kares to try to do not trigger multiple execution of JDBC at the same time and check if fixes.

To investigate further, would be great if you can provide a kind of reproducible configuration that trigger the problem.

@alromos
Copy link

alromos commented Jun 14, 2023

FYI
In my case the issue was caused by a hanging JDBC driver connection:
microsoft/mssql-jdbc#2093

Is it possible to protect pipelines from such issues on logstash side? Is there a way to specify read timeout value for a query on pipeline level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working int-shortlist
Projects
None yet
Development

No branches or pull requests

5 participants