GSoC 2024 Wrapping up: MAVPROXY AI CHAT ENHANCEMENTS

Adityaomar3 · August 23, 2024, 8:58am

Hi everyone! I am Aditya Omar, really happy to share with you the final work I’ve done during Google Summer of Code 2024 for project Mavproxy AI chat enhancements.

Description:
This project aims to enhance the capabilities of the existing AI chat module and experiment with certain parameters for better results and reduced latency. Here is the list of merged PR’s throughout this period with their brief descriptions.

1. Chat Streaming Support
This was the most awaited feature, which allows chat module to stream the replies and live server events in real time from the OpenAI servers which reduce the latency and provide better user experience.

github.com/ArduPilot/MAVProxy

chat: enabled chat streaming feature

ArduPilot:master ← adityaomar3:streaming_chat

opened 12:30PM - 26 Jun 24 UTC

adityaomar3

+100 -102

This PR adds the streaming feature by Open-AI assistants API. Previous: Chats w…ere concatenated together and displayed on screen, the wait time for replies in that approach was too long. Streaming enables the communication of server sent events, which reduces the latency and wait time. Below is a sample test-video displaying the working of stream. https://github.com/ArduPilot/MAVProxy/assets/106031961/aabc933c-73a0-448a-8c85-4c6566babca5

2. Cancel Button
This feature allows users to cancel the active run/prompt from OpenAI’s server.

3. Push to record feature
This PR adds a feature for push to record, previously the record button was hard coded to record the instructions for 5 seconds and then send it to the servers, this feature adds support to record the instructions on mouse left press down event and will stop recording when mouse left button is released.
This feature was a challenging and most time taking part to implement, we tried with multiple approaches like using mavproxy pipes and queues but it didn’t worked as expected, finally we used lists(an odd approach) to manage the state of events.

4. Experimenting with local llms
All thanks to @rmackay9 for testing local llms on his PC,
Tech stack used was Ollama and lots of ram and heavy GPU power.
Randy’s system was Dell XPS 8940 with 128GB of RAM in which response from mistral 7B was taking around 2.5minutes.
Next the system was upgraded to GPU to an RX3060(12GB) which reduced the response time to almost 14s on prompt “What is current date and time”.
After certain experiments and observing the system requirements, it was concluded that running local LLMs is still far way to be used for controlling drones via prompts.
There is a well written blog which can be followed to implement function calling via llama.cpp

5. Tuning with temperature parameter
This parameter controls the randomness of the output through LLMs for a given prompt/run.
Prompt tested: Move the vehicle 50m to its right."
Right now we have tested the prompts for 5 fresh attempts and 3 continuous attempts.

Temperature	Accuracy
0.01	>80%
0.2	>80%
1	>60%
1.8	<20%

The data suggests best value for temperature parameter comes at 0.2, further experimentation shows that best range is 0.2-0.3,

Future Prospects:-

OpenAI assistants are evolving very fast so their code bases thus a regular check and maintenance of module will be done, this time we upgraded the API from v1 to v2 to work with GPT-4o.
Exploration and experimentation with more LLMs.
Chat module issues & enhancements list · Issue #1281 · ArduPilot/MAVProxy · GitHub
Above is the list of issues and enhancements to be worked ahead in future.
Testing the temperature parameters on more prompts and for more diverse values of temperature.

Final Thoughts:-
First of all I would like to thank my mentors for this project @rmackay9 and @MichelleRos for helping me throughout the project and introducing some exciting features to the module, this was the first time I coded for the real-time project which has users, overall it was a great experience and lots of leanings. I will keep on working with cool and exciting projects ahead, thanks to everyone who were there with me throughout the journey.

Adityaomar3 · August 25, 2024, 11:40am

Do take a look at this blog post which provides a concise overview of the project initiation.

Adityaomar3 · August 27, 2024, 10:22am

Updated: Results of experimentation with more iterations and various values of temperature.

Prompts tested:
Move the vehicle 20m to its right.,
Move the vehicle 20m to its left.

Prompts have been tested for 10 iterations for each value of temperature.

Temperature	Accuracy(%)
0.01	80
0.1	50
0.2	90
0.3	70
0.4	80
0.5	70
0.6	70
0.7	70
0.8	80
0.9	70
1.0	80
1.1	70
1.2	70
1.5	50
1.8	10
2.0	0

Inference: Temperature values can vary from 0.01-2,
For temperature values 0.01-1, accuracy is variable but the speed of response is slow, response time may or may not be the factor in this case as it also depends on internet connectivity, and openAI servers which are not under our control.

For temperature values 0.2-1.0 the response was almost same but in the case of value=0.2 it completed the right task almost everytime else for other values it performed wrong sometimes but the prompt response never failed.

For temperature values greater than1.5 accuracy drastically decreased, and for the value of 2 the llm never completed the task instead hallucinated a lot.

Suggested value for the best results from the above table is temp = 0.2.