Hi @DavidBuzz and @tridge,
Thanks for having a look. The plotting side of the application still needs a lot of work to be useful but the fundamentals of uploading and processing a log are pretty solid. The log processing and plot session hosting does occur server side in this application. As you suggested, server resources could become an issue if the number of users performing large amounts of analysis on big logs was high. I have attempted to minimise server CPU and RAM usage by trying to be clever with how the application works (which I will cover at the end of this post if you are interested…).
@tridge Your suggestion of a client side processing and log display + interaction is a really good idea for a lightweight, almost server-less implementation (having the server only serve the javascript, which then does all the work client side). That form of implementation does have its own set of limitations when it comes to being able to share logs and analysis, but it really comes down to what service(s) Ardupilot wants to offer its user base. Sorry for adding confusion re the GSoC objectives, that was not my intention!
I’d be interested to know if a refined version of this application would be at all useful to the Ardupilot community, and if so, I would happily take on ideas to help meet desired requirements and features. On the flip side if it is not envisioned to be of use that’s also perfectly fine, as I’m enjoying using the project to teach myself some new python web technologies
Some notes on how the application works and where / how I have tried to save server memory:
Uploaded files are placed in a REDIS queue to be processed by a single celery background task. By limiting the background task to one worker, logs are processed sequentially in the order they were uploaded. e.g.: If multiple users upload multiple logs the lone processing task will only ever consume as much memory as the size of the log it is currently processing.
The heavy lifting of the processing occurs in a modified version of mavmemlog, which writes the log to disk in a directory structure. All messages of each type are stored in binary numpy arrays with named columns, while other useful bits of data (such as flight modes, params, etc…) are later stored in JSON text files.
Attacking the problem this way means each log is only ever processed once and it allows the required message types of interest (now binary numpy arrays with named columns) to be loaded very quickly from file. This significantly reduces memory requirements during the plotting operation and makes math / conditional masking easy.
Lastly, I was having issues with the bokeh plotting servers not releasing memory correctly so I wrapped the server and added a (currently very simple) memory watching load balancer which only spins up a new bokeh server when a user starts an analysis on a log. If the session times out or the user navigates away from the analysis page the bokeh server connection is closed, the process stopped and it re-enters the available server pool.
Cheers,
Sam