GSoC 2024 Wrapping up: MAVPROXY AI CHAT ENHANCEMENTS

Hi everyone! I am Aditya Omar, really happy to share with you the final work I’ve done during Google Summer of Code 2024 for project Mavproxy AI chat enhancements.

Description:
This project aims to enhance the capabilities of the existing AI chat module and experiment with certain parameters for better results and reduced latency. Here is the list of merged PR’s throughout this period with their brief descriptions.

1. Chat Streaming Support
This was the most awaited feature, which allows chat module to stream the replies and live server events in real time from the OpenAI servers which reduce the latency and provide better user experience.

2. Cancel Button
This feature allows users to cancel the active run/prompt from OpenAI’s server.

3. Push to record feature
This PR adds a feature for push to record, previously the record button was hard coded to record the instructions for 5 seconds and then send it to the servers, this feature adds support to record the instructions on mouse left press down event and will stop recording when mouse left button is released.
This feature was a challenging and most time taking part to implement, we tried with multiple approaches like using mavproxy pipes and queues but it didn’t worked as expected, finally we used lists(an odd approach) to manage the state of events.

4. Experimenting with local llms
All thanks to @rmackay9 for testing local llms on his PC,
Tech stack used was Ollama and lots of ram and heavy GPU power.
Randy’s system was Dell XPS 8940 with 128GB of RAM in which response from mistral 7B was taking around 2.5minutes.
Next the system was upgraded to GPU to an RX3060(12GB) which reduced the response time to almost 14s on prompt “What is current date and time”.
After certain experiments and observing the system requirements, it was concluded that running local LLMs is still far way to be used for controlling drones via prompts.
There is a well written blog which can be followed to implement function calling via llama.cpp

5. Tuning with temperature parameter
This parameter controls the randomness of the output through LLMs for a given prompt/run.
Prompt tested: Move the vehicle 50m to its right."
Right now we have tested the prompts for 5 fresh attempts and 3 continuous attempts.

Temperature Accuracy
0.01 >80%
0.2 >80%
1 >60%
1.8 <20%

The data suggests best value for temperature parameter comes at 0.2, further experimentation shows that best range is 0.2-0.3,

Future Prospects:-

  1. OpenAI assistants are evolving very fast so their code bases thus a regular check and maintenance of module will be done, this time we upgraded the API from v1 to v2 to work with GPT-4o.
  2. Exploration and experimentation with more LLMs.
  3. Chat module issues & enhancements list · Issue #1281 · ArduPilot/MAVProxy · GitHub
    Above is the list of issues and enhancements to be worked ahead in future.
  4. Testing the temperature parameters on more prompts and for more diverse values of temperature.

Final Thoughts:-
First of all I would like to thank my mentors for this project @rmackay9 and @MichelleRos for helping me throughout the project and introducing some exciting features to the module, this was the first time I coded for the real-time project which has users, overall it was a great experience and lots of leanings. I will keep on working with cool and exciting projects ahead, thanks to everyone who were there with me throughout the journey.

4 Likes

Do take a look at this blog post which provides a concise overview of the project initiation.

Updated: Results of experimentation with more iterations and various values of temperature.

Prompts tested:
Move the vehicle 20m to its right.,
Move the vehicle 20m to its left.

Prompts have been tested for 10 iterations for each value of temperature.

Temperature Accuracy(%)
0.01 80
0.1 50
0.2 90
0.3 70
0.4 80
0.5 70
0.6 70
0.7 70
0.8 80
0.9 70
1.0 80
1.1 70
1.2 70
1.5 50
1.8 10
2.0 0

Inference: Temperature values can vary from 0.01-2,
For temperature values 0.01-1, accuracy is variable but the speed of response is slow, response time may or may not be the factor in this case as it also depends on internet connectivity, and openAI servers which are not under our control.

For temperature values 0.2-1.0 the response was almost same but in the case of value=0.2 it completed the right task almost everytime else for other values it performed wrong sometimes but the prompt response never failed.

For temperature values greater than1.5 accuracy drastically decreased, and for the value of 2 the llm never completed the task instead hallucinated a lot.

Suggested value for the best results from the above table is temp = 0.2.

1 Like