Switch back to subprocessing for llama.cpp

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
This commit is contained in:
Daniel Hiltgen
2024-03-14 10:24:13 -07:00
parent 3b6a9154dd
commit 58d95cc9bd
35 changed files with 1416 additions and 1910 deletions

View File

@@ -2768,7 +2768,7 @@ inline void signal_handler(int signal) {
shutdown_handler(signal);
}
int _main(int argc, char **argv)
int main(int argc, char **argv)
{
#if SERVER_VERBOSE != 1
log_disable();