GSoC 2026: Question on AI-Assisted Log Diagnosis project (data & scope)

Hi everyone,

I’m a prospective GSoC 2026 applicant interested in the
“AI-Assisted Log Diagnosis & Root-Cause Detection” project.

I have a few clarification questions to better understand the scope:

  1. What types of labeled data (if any) currently exist for common failure modes or misconfigurations in ArduPilot logs?
  2. Are there existing datasets, issue references, or log repositories that are typically used as a starting point for this work?
  3. Would an initial prototype be expected to focus more on supervised classification, similarity-based retrieval, or rule-assisted ML approaches?

For background, I have experience with Python-based ML pipelines, data preprocessing, and model evaluation, and I’m looking to align my preparation with the project’s expectations.

Any guidance would be greatly appreciated.

Thanks!
Krishna

None exist.

None exist.

I guess supervised classification and rule-assisted ML approaches.

2 Likes

Thanks for the clarification — that makes sense.

I’ll start by exploring ArduPilot logs and existing failure patterns, and prototype a rule-based labeling pipeline that could later support supervised models.