How Eolymp tests submissions
Today we'll take a look at how Eolymp tests submissions: what happens after you submit a solution,how it is executed, how the results are verified and what outcomes there are. Hope this article will help you betterunderstand how the system works and make it easier to work with it.
We'll also take a look at how some of the new features on the website, namely test suites and interactive problems.
For starters, it's worth mentioning that the website and the testing system are two separate applications. The websiteallows users to submit solutions and view the results, while the testing system runs and evaluates them. This separationmakes it possible to simplify each of the applications and clearly defines responsibilities between them.
When you click the "Submit" button on the website, your solution is added to the testing queue, from where the testing system will pick it up and starts testing process.
Testing queue
The testing queue is kind of an additional buffer between the website and the testing system.Even if the testing system does not have capacity to process all of the submissions, the website will continue to operateand accept new ones. The queue will keep submissions until one of the testing systems is free.
At Eolymp, several test systems are always running, they are constantly check the queue and receive new submissions.
Each testing system has a separate dedicated server and only verifies one submission at a time.Submissions are never verified in parallel on the same server, as this can affect the result.
Usually submissions do not stay in the queue for a long time, but sometimes they can remain there for a bit of time.When users are making a lot of submissions, they start to accumulate in the queue and you can see that your submissionremains in the "pending" status for a certain time.
Preparation and compilation
When the testing system receives a submission from the queue, it starts preparation and compilation procedure.At this stage, the testing system prepares everything needed for execute your program and perform testing.
First of all, the testing system creates an empty working directory and a new sandbox environment for your submission.Then, it will create a file with the source code and execute the compilation command. For example, for C++, the systemwill create a file source.cpp
and execute the command g++ source.cpp -o a.exe
.
If the compilation command is executed with an error, the system will report the result "Compilation error". In this case, the output of the compilation command is sent back to the website so that you can see the text of the error.
The compilation command will differ depending on the programming language. For programming languages that require compilation to machine code (or bytecode), the compile command will create an appropriate binary file.For programming languages that do not require compilation, the compilation command is not executed, or it just performs a simple syntax check, to make sure you haven't forgotten ";" somewhere.
If the source file is not needed to run, it is removed.
If the compilation is successful, the system proceeds to run your program.
Running and verifying
First, the input file input.txt
is copied to the working directory, and then the run command is executed.
Like the compile command, the run command is different for each programming language. For example, for C++,the run command is simply the name of the binary file a.exe
, and for PHP the execution command looks like this php source.php
.
When launching the program, the testing system redirects the contents of the file input.txt
to the program's standard input (stdin), and the standard output (stdout) of the program is redirectedto the file output.txt
. This means that instead of getting input from the keyboard, your program will read data from the file input.txt
, and whatever your program prints to the screen will be automatically saved to the file output.txt
.
It works similar to when your run your program like this: a.exe < input.txt > output.txt
in the command line.
The testing system starts to monitor the program process, awaiting its completion. At the moment the program starts, the testing system starts the execution timer.When this timer exceeds the maximum allowed limit (the limit for the program execution time), the testing system stops the process with a special signal to the operating system.
Upon completion, the testing system receives data on the resource usage of your program. This data includes exit code, memory size, and processor usage time.
The next step, the testing system starts the process of verifying the answer.
First of all, the program execution time is checked. The system compares the execution time with the limit and if it is exceeded, the testing system counts the result "Time limit exceeded".
Then, system checks memory usage limit. If the amount of memory used during execution exceeds the allowed limit, the testing system counts the result "Memory Overflow".
Then, the exit code is checked. According to the standard, the exit code of the program should be 0 in case of success, and different from 0 in case of an error.For example, if your program tries to access an invalid area of memory, the operating system will terminate your program with a nonzero exit code.
Thus, if your program returns a nonzero exit code, the testing system will count the result "Runtime error" and proceed to run the next test.
If the exit code is zero, the procedure for comparing the response will begin.
At this point, the system copies the file with the correct answer.txt
to the working directory and compares it to the answer in output.txt
.
Depending on the problem, the answer can be compared in different ways. A special program is assigned to each problem to perform comparison. This program is called a "checker".
In many problems, the answer is compared byte by byte, that is, so each byte in the output.txt
file must match the corresponding byte in answer.txt
. In such problems, it is very important to format the answer correctly, add all the necessary spaces and newline characters. Otherwise the answer will not be accepted.
In some problems, values are compared according to their type, that is, numbers are compared as numbers and strings are compared as strings.This allows you to compare numbers with a certain precision or compare strings regardless of the case of letters in the answer.In such problems, additional space and newline characters will not affect the result.
In problems in which there can be more than one correct answer, a custom checker is used. It's written specifically for the problem. This type of checkers usually builds the necessary data structures in memory and analyzes the answer. For example, in the problem of finding the shortest path in a graph, the checker will actually build a graph and try to follow the path given in the answer.
Depending on the result of the comparison, the check may end with a result "Accepted" or "Wrong answer".In some tasks, the checker may give partial points for the result "Wrong answer".
At this point, the system cleans up the working directory (deletes all files except those required to execute the program) and starts checking the next test.
After the last test, the testing system deletes the working folder and environment and makes a request to get the next submission from the queue.
Test suites
Recently, the website got the ability to use test suites. You may have seen that instead of a simple list, some of the tests are now bundled together.
Test suites are groups of tests that are combined together and represent a sub-problem. Each of the sub-problems represents a special case for the problem. For example, sub-problems can have different constraints on the input values, the first would use values from 1 to 100, and the second would use values from 101 to 1000. Suite with number 0 usually represents tests from the statement.
Each test suite has scoring and feedback settings.
There are two options for scoring test suites: each test individually or a suite as a whole. In the first option, you receive points for each test, regardless of whether you succeeded in passing the other tests in the suite. In the second option, you need to pass all the tests of the suite in order to get points. If even one test fails, the whole suite will not give you a single point.
Another feature of test suites is providing limited feedback. Depending on the problem settings, suite tests can be visible or hidden. If feedback is limited, the results are displayed as follows:
If all of the tests in the suite have passed, you will see outcome "Accepted" and the maximum values for time and memory use are displayed.
If the test suite contains tests that failed, the result of the first non-passing test will be shown along with its execution time and memory usage.
Additionally, the suites may have dependencies among themselves. The author of a problem can configure test suites so that you need to pass all of the tests in one suite for testing of another suite to start. Otherwise, the other suite status will be shown as "Blocked".
So, suites can be scored by test or as a whole, can provide limited feedback, and can also have dependencies among themselves. All these settings are selected by the author of the problem in order to obtain the required level of difficulty.
Interactive problems
For the last stage of the All-Ukrainian Programming Olympiad, the so-called interactive problems were also implemented on the website.These are special problems in which a special interactive program is used instead of the static input.txt
and output.txt
files.
In general, the verification process is the same as one described above, with the difference that during the verification your program is launched in alongside with the interactive program created by the author of the problem.Programs are run in such a way that the ouput of the interactor program is passed to the input of your program, and the output data that your program generates is passed as input to the interactor program.
This allows to create dynamic tests in which your program can query the interactor program and receive dynamic responses.
Test results depend on the result returned by the interactor program. If you provided the correct answer, the interactor program will send a "Accepted" signal to the testing system or a "Wrong answer" signal if the answer is not correct.
Output only problems
Usually, you can fine one or more examples in the problem statement. It makes it easier to see the format of the input data and to better understand the problem.But the tests that are used to evaluate and validate the solution are not available. That is, you do not just need to calculate the answer, but rather create an algorithm that will work for any set of input data.
But there are also problems in which you are given all the input data and you just need to find the answer. In such problems, you have no limits on execution time or memory usage. You can run the program as you like and as many times as you like. In theory, you can even calculate the answer manually, without writing a program.The only thing that matters is the answer you can get.
In order to send an answer to problems of this kind, you need to send a solution specifying the "Plain Text" programming language. In this case, during the check, the system will simply create a file with the answer output.txt
.