OSDI '20 Artifact Evaluation Paper #2 Reviews and Comments
===========================================================================
Paper #2 Testing Database Engines via Pivoted Query Synthesis


Comment @A1 by Reviewer A
---------------------------------------------------------------------------
Hello Authors,

Hope you are fine. I am having trouble downloading the submission. For some reason unknown to me, the process is both very slow and fails randomly either in the middle or somewhere close to the end. I have checked my connection and I dont seem to have issues. Are you able to download it?


Comment @A2 by Manuel Rigger <manuel.rigger@inf.ethz.ch>
---------------------------------------------------------------------------
Dear Reviewer,

We are sorry that you have trouble downloading the artifact. While the Zenodo link works for us, we have also uploaded the artifact to Google Drive and our university's file hosting platform (which maintains the reviewing anonymity):
* https://drive.google.com/file/d/1_6wLGYvTgkfUc1bdBbIKd8__0X4QV256/view?usp=sharing
* https://polybox.ethz.ch/index.php/s/yRAgjinyMxdsRkA

The MD5 should be the same as for the Zenodo-hosted version of the artifact: 1a833637d65c83df5d78bd85d8080242. We hope that this helps.


Review #2A
===========================================================================

Artifacts Available: Overall Score
----------------------------------
3. Available

Artifacts Functional: Overall Score
-----------------------------------
5. Exceeded expectations

Results Reproduced: Overall Score
---------------------------------
5. Exceeded expectations

Evaluator Confidence
--------------------
3. High

Paper Summary
-------------
The authors propose an approach based on generating semi-random queries to find (tons of) bugs in database systems (e.g. SQLite or MySQL) targeting logic bugs, meaning bugs where albeit there is a result (no crash) that result is incorrect. Using their tool, they found/reported over 121/96 bugs in 3 popular DB systems, most of them addressed by developers with fixes or documentation changes.

Artifact Summary
----------------
The authors submitted their java-based tool ready to compile and use. They delivered a sample database to backup their experimental results as well as instructions to reproduce them from scratch. They also included a helpful/short video tutorial on how to use their tool.

Comments for Authors
--------------------
Dear Authors,

I initially checked your results using the database you provided, everything looks OK. I tested against the database versions (SQLite) you mention in your paper, let it running for a few hours (I have a time budget) and was able to find several bugs. I also tested against other versions for an hour (MySQL 8.0.21 and SQLite 2.8.17, just the defaults Ubuntu 20 allowed me to install) and I was able to find two bugs on the later (made me happy) but not on the former. I don't regard the fact that finding bugs takes hours as problematic since this is inherent to this type of artifacts.

This is a tool that could be easily adopted provided someone on the team is willing to make an effort to extend/maintain it. The tool itself is well designed/written, albeit I do have some minor cosmetic comments (bellow).

(Minor) Cons:

1. This kind of output
 
''[2020/08/27 17:21:08] Executed 15949 queries (3189 queries/s; 0.40/s dbs, successful statements: 52%). Threads shut down: 0.''

could be better addressed. It would be useful to have some more info (if possible) on what has been found (if anything) even if its approximate, since it could give the developer an idea on how long should he/she run this. 

2. You need to add more comments to the code, several classes just don't have docs and others have comment lines that are clearly directed to your team and not focused on providing knowledge to other teams (which is the purpose of documenting the code when open sourcing something).

3. Some of the links you provide in the sample database don't work, for example, the ones from the SQLite mailing list. I understand that this does not really depend on you, but if they are there and they don't work is better to remove them.

4. Many many configurations options, could be better documented (or illustrated) in the companion video. 

Nevertheless, from my perspective this is an artifact that has what it takes to become a success. I marked it as a distinguished artifact because of its usability and effectiveness, as I said, this is something that could easily become adopted. Congratulations on your amazing work!


Review #2B
===========================================================================

Artifacts Available: Overall Score
----------------------------------
3. Available

Artifacts Functional: Overall Score
-----------------------------------
5. Exceeded expectations

Results Reproduced: Overall Score
---------------------------------
5. Exceeded expectations

Evaluator Confidence
--------------------
2. Medium

Paper Summary
-------------
This paper proposes Pivoted Query Synthesis to detect logic bugs in Database Management Systems (DBMS). Unlike crash bugs, logic bugs lead to unexpected query results instead of system crashes. In Pivoted Query Synthesis, a random row is selected as a pivot row. For the pivot row, a query that guarantees to fetch the row is generated. If the actual results of the query don't contain the pivot row, a logic bug occurs. By using the proposed scheme, hundreds of logic bugs are found in SQLite, MySQL, and PostgreSQL.

Artifact Summary
----------------
The artifacts include a copy of the open-source code (sqlancer, https://github.com/sqlancer/sqlancer), a sqlite database file containing the information of all reported bugs, and a video for the introduction. With the database file, most statistics in the paper (e.g., the number of bugs in different DBMSs, the ratios of different cases, and the number of SQL statements in a simplified test case) are supposed to be validated by executing the provided commands.

Comments for Authors
--------------------
**Artifacts Available**: The artifacts are available on Zenodo. The open-source project is hosted on GitHub.

**Artifacts Functional**:  The documentation is in detail. It is easy to check the information and statistics of reported bugs. In additional to the README, authors also provide a prerecorded video, giving an introduction of the artifacts and some helpful suggestions for the evaluation. Due to the detailed documentation and instructions, the artifacts are easy to evaluate. 

For reported bugs, it is possible to trigger the bug in the corresponding version of DBMS. I was able to reproduce the bugs in sqlite following the instructions in documentation.

The open-source code is well-maintained on GitHub and has received lots of interests (570+ stars until now). In addition to three DBMSs (i.e., SQLite, MySQL, and PostgreSQL) evaluated in the paper, sqlancer now also supports some other databases, e.g., CockroachDB and TiDB. Since the proposed scheme is based on traditional SQL, sqlancer is flexible and general to lots of DBMSs. In summary, sqlancer is an exercisable tool to find logical bugs in DBMSs.

**Results Reproduced**: The key results in the paper can be reproduced in the artifacts. The evaluation is conducted in a server with Ubuntu 18.04.5 (kernel version: 5.4.0-47-generic). In terms of the dependencies, java version is 11.0.8 (openjdk) and Apache Maven version is 3.6.0. Due to the wide use of java and maven, the experimental setup is very easy. 

By querying the database file of bug reports (i.e., "database.db"), we are possible to check the metadata of bug records. The key results in the paper, such as Table 2, Table 3, and statements in Section 4.3, can be validated by executing queries on the database file. The query results match the reported values in the paper.

By switching to old versions of sqlite (using `fossil`), I reproduced some reported bugs in the paper and database file. Due to limit time, I have checked three bugs (the IDs in "DBMS_BUGS_TRUE_POSITIVES" are 2, 120, and 121) and these bugs are successfully triggered. After the bugfix commit, the pivot rows are fetched as expected. Hence, we can confirm the existence of reported bugs.

In summary, it’s easy to reproduce the results in the artifacts.

**Additional comments for improvement**:
- Some hints about the required modifications to support other DBMSs (e.g., new DBMSs) would be helpful.
- The output of sqlancer for the occurrence of bugs contains many SQL statements, requiring expertise in SQL to simplify the test case and find the root cause of bugs. One of the test cases I encountered has more than 100 SQL statements. It would be interesting to see a tutorial about how to obtain a simplified test case from the result of sqlancer. Is it possible to develop an automated tool for the simplification?


Review #2C
===========================================================================

Artifacts Available: Overall Score
----------------------------------
3. Available

Artifacts Functional: Overall Score
-----------------------------------
5. Exceeded expectations

Results Reproduced: Overall Score
---------------------------------
4. Met expectations

Evaluator Confidence
--------------------
2. Medium

Paper Summary
-------------
The paper presents PQS, a general technique for finding _incorrect_ results in a DBMS. The idea behind PQS is to synthesize single-row queries that, when failed, demonstrate the existence of a logical bug. PQS was implemented in SQLancer and applied to SQLite, MySQL, and PostgreSQL, uncovering a total of 96 bugs.

Artifact Summary
----------------
The submitted artifacts contain a README.md containing a setup and quick-start guide, the submitted version of the paper, a sample database and the source code for the tool. Additionally---and most impressively---they contain a 30-minute introductory video that I found super super helpful!

Comments for Authors
--------------------
Thank you for submitting the artifact, and even more for including the video explaining how to set up and use the tool. I found this extremely helpful.

I had significant trouble downloading the original artifact, and I was about to suggest hosting on GitHub---but was very happy to see that the tool associated with the paper has actually been made available for retrieval, permanently and publicly.

The artifacts conform to the expectations set by the paper in terms of functionality, usability, and relevance. I would even go further, by saying that the both the submitted and the public tool surpass the expectations set by the paper in the aforementioned terms.

I was able to generate bugs on a Debian server running a Linux kernel version 4.4.0-134 and Java SE 1.8.0 251.