Skip to content

Conversation

@vector-of-bool
Copy link
Contributor

Refer: CDRIVER-6107

This changeset introduces the ability to apply label/tags to test cases, and exposes those as CTest labels.

This also includes new general mutable string and array/vector handling utilities, and refactors a lot of our test suite to use those.

Changes:

  • 6d908aa supports a because clause on all mlib_check assertions. This message is included in the output when the program terminates, and acts as inline documentation for why a particular assertion is present.
  • d636e33 Brings mstr, the data-owning counterpart to mstr_view. This piece has been in a stash for a while, and was brought out as a better way to handle mutable strings.
  • 9675ae5 Adds vec.t.h, a generic template header that is used to define vector types. Basically:
    • #define a type T, and then #include <mlib/vec.t.h>, and it will define a T_vec type that acts as a dynamically sized contiguous array of T objects.
    • Additional macros can be defined to add copy/destroy semantics to the vector objects.
  • f97883e Adds mstr_trim for removing whitespace from string views.
  • 47629d7 Refactors a lot of our test suite code to use mtsr and vec.t.h types. Also adds mlib/str_vec.h which just defines the mstr_vec for the very common "array of strings" type.
  • b39b795 Adds support for specifying tags/labels associated with test cases:
    • The tags are specified within the same string as the test name. (There was an attempt to add a second string parameter for tags, but this was extremely cumbersome to update all of the test cases across the codebase to pass a second empty string, especially with all the different signatures for adding test cases. A future refactor may want to consolidate our many "TestSuite_Add" functions/macros.)
    • After the test case name, place one or more spaces, and then a list of [bracketed][tags].
    • To support exporting of tags for LoadTests.cmake, the --tests-cmake switch will print CMake code that defines all the test cases. I wanted to use JSON to emit the test cases, but CMake's JSON handling functionality is incredibly slow for very large JSON blobs, so directly emitting CMake code was chosen instead.
    • LoadTests.cmake will evaluate the emitted CMake code and use it to call add_test for all the test cases, as well as apply the test case labels and fixtures.
    • If a test has a tag [uses:foo], then LoadTests.cmake will add a FIXTURES_REQUIRED of mongoc/fixtures/foo. This currently only applies to the IMDS tests, but will eventually be used for other test case fixtures.

This change allows for test cases to declare any number of associated
"tags". The tags are specified after the test case name string as a
list of bracketed tags, mimicking Catch2's syntax.

The LoadTests.cmake script has been modified to apply test case's declared
tags as CTest labels. This also gives us the ability to apply test
fixture requirements granularly on only tests that declare their
requirement (via a tag).
@vector-of-bool vector-of-bool marked this pull request as ready for review October 20, 2025 23:26
@vector-of-bool vector-of-bool requested a review from a team as a code owner October 20, 2025 23:26
@vector-of-bool vector-of-bool requested review from kevinAlbs and rcsanchez97 and removed request for rcsanchez97 October 20, 2025 23:26
@kevinAlbs kevinAlbs requested a review from eramongodb October 22, 2025 16:32
Copy link
Contributor

@eramongodb eramongodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial feedback. TIL RESOURCE_LOCK, this is fantastic. I am also very excited about the mlib_str and mlib_vec APIs.


Some benchmarks, averaged across 10 runs ('~' is standard deviation), testing only /bson/* (438 tests, all safe to execute in parallel) with --no-fork (/bson/value/big_copy is skipped):

  • test-libmongoc (current): 0.79 secs (~0.05)
  • test-libmongoc (PR): 0.82 secs (~0.04)
  • ctest: 4.56 secs (~0.75)
  • ctest -j 0: 1.57 secs (~0.12)

It is surprising to me that ctest with parallelism is almost twice as slow as the single-threaded test-libmongoc. We may need to explore improving the test registration and filtering setup (in a followup PR is fine) to avoid this performance overhead as much as possible (the eager test registration skips implemented by this PR in TestSuite_Add*() do not seem to be sufficient).


Just a note that the current output of ctest --print-labels is:

All Labels:
  ipv6
  lock:fake-kms
  lock:live-server
  test-libmongoc-generated
  timeout:10
  timeout:20
  timeout:30
  uses:fake_kms_provider_server
  uses:simple-http-server-18000

Update: "json" and "slow" have been removed.


Some questions:

  • What motivated the addition of [timeout:N] to certain tests and not others? Is the default timeout of 10 seconds too aggressive (e.g. perhaps 60 secs may be preferable)?
  • What is the purpose of the [json] and [slow] tags, both of which are only used by JSON tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer we avoid introducing any more "X macro headers" if we can help it.

This may be a large ask, but I believe we should be able to apply the "template" pattern used by MONGOC_DECL_SPECIAL_TS_POOL to implement the following API instead:

#ifndef MLIB_VEC_H_INCLUDED
#define MLIB_VEC_H_INCLUDED

... // Implementation details.

#define mlib_vec_declare(_name, _type)     \
  typedef struct MLIB_PASTE(_name, _vec) { \
    _type *data;                           \
    ...                                    \
  };                                       \
  ...

#endif // MLIB_VEC_H_INCLUDED
#include <mlib/vec.h> // Normal header.

...

#define TestSkipVec_DestroyElement(Skip) TestSkip_Destroy(Skip)

mlib_vec_declare(TestSkipVec, TestSkip)

I expect conditionally-defined API (e.g. VecDestroyElement(Ptr)) can be implemented using a pattern similar to bsonPredicate, but where the "not defined" case is ignored rather than an error:

#define bsonPredicate(P) _bsonPredicate _bsonDSL_nothing()(P)
#define _bsonPredicate(P) _bsonPredicate_Condition_##P
#define _bsonPredicate_Condition_ __NOTE__Missing_name_for_a_predicate_expression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While possible, defining the vec type in the same terms of many function pointers would make it significantly more complex to modify and would have performance and type-checking implications.

Alternatively, defining everything in a gigantic macro definition would also be possible, but would be must more difficult to edit and lose editor doc comment assistance. In both cases, doing conditional inclusion of APIs (i.e. conditional copy, conditional non-trivial destroy) would be much more difficult.

Copy link
Contributor

@eramongodb eramongodb Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😔 If that is the case, I'd prefer at least that we use a non-.h extension so that tools do not mistaken this as a standalone-includable header (thus forcing exclusion filters to avoid errors or noisy diagnostic, e.g. as currently required by bsoncxx/enums/* headers in the C++ Driver here, here, and here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used alternative suffixes for such "special-include" files before, such as .inl or .inc. Is there any preference? (Not .i, as that usually means "file of preprocessor output")

Copy link
Contributor

@eramongodb eramongodb Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .inl ("inline") is typically associated with out-of-line template definitions and/or inline functions for C++, so I slightly prefer the .inc extension, but it seems to be very overloaded in usage across languages and tools. I initially thought just swapping .t.h to .h.t would work, but .t is apparently used for Perl test scripts.

Perhaps .th may work. It's what we already have now ("template header"), but drops the . in t.h. I cannot find any use of this file extension which is relevant enough to be potentially conflated with our usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming vec.t.h to vec.th (first choice) or vec.inc (second choice).

@vector-of-bool
Copy link
Contributor Author

@eramongodb I'm unfortunately aware of the big slowness when running tests via CTest. The issue is indeed caused by the overhead of spawning the subprocess which then needs to register all the test cases, even if it only needs one of them.

I'm hoping in a future change, after more adoption of CTest, to speed it up significantly by simplifying the startup process, but for now it runs fast enough that I don't worry about it.

re: [slow][json] tags: Those were leftover from an earlier attempt to handle timeouts and live tests. I've removed them now.

re: [timeout:N] I wanted to have a much stricter timeout on tests, and only exempt those that were really really slow, and have tests declare their own timeout in-situ, so [timeout:N] was added as a way to do that. It may be sufficient to just set the default timeout really high, but I wanted to catch tests that were slow and specifically call them out in-source. This also gives the ability to select tests that don't need a big timeout (i.e. will run very fast) with ctest --label-exclude timeout:.

Comment on lines -24 to -25
#include <mongoc/mongoc-ssl.h>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this removal necessary/intentional given the now-guarded include in test-mongoc-topology-scanner.c?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming vec.t.h to vec.th (first choice) or vec.inc (second choice).

// Note: We do not initialize any of the data in the newly allocated region.
// We only set the null terminator. It is up to the caller to do the rest of
// the init.
data[new_len] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data[new_len] = 0;
data[new_len] = '\0';

char literal for char assignment.

Comment on lines +806 to +807
const mstr ret = {0};
return ret;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const mstr ret = {0};
return ret;
return (mstr){0};

Simplify?

/**
* @brief Like `mstr_sprintf_append`, but accepts the va_list directly.
*/
MLIB_IF_GNU_LIKE(__attribute__((format(printf, 2, 0))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and other occurrances of the) printf attribute in this file may need to special-case for GCC mingw compatibility by using gnu_printf, e.g. see "third issue" PR description of mongodb/libmongocrypt#1054 (comment) and (note: the version check is likely unnecessary at this point given other occurrances of this pattern):

#if defined(__clang__)
#define BSON_GNUC_PRINTF(f, v) __attribute__((format(printf, f, v)))
#elif BSON_GNUC_CHECK_VERSION(4, 4)
#define BSON_GNUC_PRINTF(f, v) __attribute__((format(gnu_printf, f, v)))
#else
#define BSON_GNUC_PRINTF(f, v)
#endif

Comment on lines +1029 to +1035
size_t added_len = strlen(format);
// Give use some wiggle room since we are inserting characters:
if (mlib_mul(&added_len, 2)) {
// Doubling the size overflowed. Not likely, but just use the original string
// size and grow as-needed.
added_len = strlen(format);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t added_len = strlen(format);
// Give use some wiggle room since we are inserting characters:
if (mlib_mul(&added_len, 2)) {
// Doubling the size overflowed. Not likely, but just use the original string
// size and grow as-needed.
added_len = strlen(format);
}
const size_t format_len = strlen(format);
size_t added_len = format_len;
// Give use some wiggle room since we are inserting characters:
if (mlib_mul(&added_len, 2)) {
// Doubling the size overflowed. Not likely, but just use the original string
// size and grow as-needed.
added_len = format_len;
}

Avoid computing strlen(format) twice.

* @brief Pointer to the first vector element, or NULL if the vector is
* empty.
*
* @note DO NOT MODIFY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @note DO NOT MODIFY
* @note DO NOT DIRECTLY MODIFY THIS VALUE

Suggest being a bit more specific about what these notes mean.

// We compare against (signed) SSIZE_MAX because want to support the difference
// between two pointers. If we use the unsigned size, then we could have vectors
// with size that is too large to represent the difference between two sizes.
return SSIZE_MAX / sizeof(T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSIZE_MAX may require platform compatibility workarounds (e.g. see here). Consider using (SIZE_MAX / 2u - 1u) instead.

Comment on lines +323 to +325
if (mlib_add(&count, 1)) {
// Adding another element would overflow size_t. This is extremely unlikely,
// but precautionary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider importing BSON_(UN)LIKELY into mlib?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants