Zero-overhead Destructors in C

CNXSoft: This is another guest by Blu, this time about C programming, and specifically destructors in C programming language

If you asked seasoned C++ developers what their favorite features in the C++ language might be, chances are that destructors would be on everybody’s shortlist. As many other C++ developers, I too tend to do my occasional share of C, and if there’s one feature I dearly miss in C that is destructors precisely in their capacity of automating the release of resources at the right moment. But first a disclaimer is in order: many people call simple application of destructors ‘RAII’ ‘Resource Acquisition Is Initialization’; I find this acronym unnecessarily awkward and obfuscating an otherwise straightforward concept, so you won’t see this acronym through the end of this text. Instead, I’ll be using ‘end-of-scope’ action.

Traditionally, in the language of C end-of-scope (more often end-of-function) actions are achieved via deliberate arrangement of the control flow, often via goto’s and collector labels, occasionally via long jumps. Unfortunately, goto’s and jumps have the propensity to not make the code any easier to read once they accumulate in sufficient quantities. I’ve cut my teeth in BASIC and assembly, and I still read plenty of assembly to this day, but I’m not fond of goto’s and try to use them sparingly in C. Had C had destructors I wouldn’t have to worry ‘What exactly do I need to clean up upon encountering this error now and do I already have a label for that?’ such questions normally do not stand with destructors.

In terms of functionality, destructors are trivial to implement in C for every local variable that needs end-of-scope action, you register that into some kind of a per-scope ‘registry’, and then explicitly act to the content of that registry upon leaving said scope easy-peasy. Unfortunately, there is a performance price associated with that doing container iterations at run-time, perhaps concluded with potentially-expensive indirect calls. Now that is something C++ compilers don’t do they just ‘know’ what local variables have what corresponding destructors, and call those directly, virtual destructors notwithstanding, at end-of-scope conditions. Ergo, you won’t see much C code mimicking C++ destructors in contrast to other popular languages C developers do not take overhead lightly.

Contemporary C (and more so C++) compilers are ultra-sophisticated code-semantics analysis tools. They have to be in order to optimize effectively today’s gargantuan code bases. One of their fundamental ‘tricks of the optimization trade’ is ‘constant folding’ detection of compile-time constants and the subsequent move of related computations from run-time to compile-time, where that would make sense. So, dealing with C++ compiler optimizations day-in and day-out, and having just gone through yet another ‘goto’s, goto’s everywhere’ piece of otherwise-magnificent C code, I had to ask myself ‘Can we use constant folding to implement proper, zero-overhead destructors in C?’

Again, we don’t have to be overly-smart with the general approach that remains the same. We just have to ‘persuade’ the C compiler to ‘acknowledge’ the constants in our improvised implementation of destructors, and fold the associated run-time computations accordingly. Very much to the same effect a C++ compiler does with real C++ destructors. So, let’s get to business.

The code we will be examining will contain an implementation of the proposed mechanism and an example use case. For that we need a structure to which we will apply constructors and destructors:

typedef struct {
   int val;
} Foo;

typedef struct {

int val;

} Foo;

And here are our example constructor (ctor) and destructor (dtor):

void
Foo_ctor(Foo* self) {
    assert(self);

    // init some
    self->val = 42;
    fprintf(stderr, "%s, %p, val: %d\n", __FUNCTION__, self, self->val);
}

void
Foo_dtor(Foo* self) {
    assert(self);

    // deinit some
    self->val = 43;
    fprintf(stderr, "%s, %p, val: %d\n", __FUNCTION__, self, self->val);
}

void

Foo_ctor(Foo* self) {

assert(self);

// init some

self->val = 42;

fprintf(stderr, "%s, %p, val: %d\n", __FUNCTION__, self, self->val);

}

void

Foo_dtor(Foo* self) {

assert(self);

// deinit some

self->val = 43;

fprintf(stderr, "%s, %p, val: %d\n", __FUNCTION__, self, self->val);

}

According to those, our structure Foo is initialized by setting its member ‘val’ to 42, and de-initialized by setting the same member to 43 just arbitrary code we can easily tell apart. Eventually, we print out the name of the function, the address of the ‘self’ variable and its val member at the end of each function. Clearly, printing the name of the function *and* the value of val presents an informational redundancy to the printf reader, but hey, we are generous today : )

Let’s say we have the following declarations in our main:

int main(int argc, char** argv) {
    Foo f;

    if (argc > 1) {
        Foo g;
        Foo h;

        if (atoi(argv[1]) == 42) {
            return EXIT_FAILURE;
        }
    }

    Foo g;
    return EXIT_SUCCESS;
}

int main(int argc, char** argv) {

Foo f;

if (argc > 1) {

Foo g;

Foo h;

if (atoi(argv[1]) == 42) {

return EXIT_FAILURE;

}

Foo g;

return EXIT_SUCCESS;

}

If our proposed ctor and dtor magically worked, we would expect the following scenarios to hold true:

# no args ‒ two declarations in the outermost scope
$ ./zod
Foo_ctor, 0xffa1bcb8, val: 42
Foo_ctor, 0xffa1bcb4, val: 42
Foo_dtor, 0xffa1bcb4, val: 43
Foo_dtor, 0xffa1bcb8, val: 43

# one arg different than ‘42’ ‒ two declarations in the outermost and two in the innermost scopes, where the innermost declarations precede the second outermost declaration
$ ./zod 1
Foo_ctor, 0xff8fb928, val: 42
Foo_ctor, 0xff8fb924, val: 42
Foo_ctor, 0xff8fb920, val: 42
Foo_dtor, 0xff8fb920, val: 43
Foo_dtor, 0xff8fb924, val: 43
Foo_ctor, 0xff8fb924, val: 42
Foo_dtor, 0xff8fb924, val: 43
Foo_dtor, 0xff8fb928, val: 43

# one arg equal to ‘42’ ‒ one declaration in the outermost scope, two declarations in the innermost scope, followed by an early exit form the function (precluding the second declaration in the outermost scope)
$ ./zod 42
Foo_ctor, 0xffe9a0c8, val: 42
Foo_ctor, 0xffe9a0c4, val: 42
Foo_ctor, 0xffe9a0c0, val: 42
Foo_dtor, 0xffe9a0c0, val: 43
Foo_dtor, 0xffe9a0c4, val: 43
Foo_dtor, 0xffe9a0c8, val: 43

# no args ‒ two declarations in the outermost scope

$ ./zod

Foo_ctor, 0xffa1bcb8, val: 42

Foo_ctor, 0xffa1bcb4, val: 42

Foo_dtor, 0xffa1bcb4, val: 43

Foo_dtor, 0xffa1bcb8, val: 43

# one arg different than ‘42’ ‒ two declarations in the outermost and two in the innermost scopes, where the innermost declarations precede the second outermost declaration

$ ./zod 1

Foo_ctor, 0xff8fb928, val: 42

Foo_ctor, 0xff8fb924, val: 42

Foo_ctor, 0xff8fb920, val: 42

Foo_dtor, 0xff8fb920, val: 43

Foo_dtor, 0xff8fb924, val: 43

Foo_ctor, 0xff8fb924, val: 42

Foo_dtor, 0xff8fb924, val: 43

Foo_dtor, 0xff8fb928, val: 43

# one arg equal to ‘42’ ‒ one declaration in the outermost scope, two declarations in the innermost scope, followed by an early exit form the function (precluding the second declaration in the outermost scope)

$ ./zod 42

Foo_ctor, 0xffe9a0c8, val: 42

Foo_ctor, 0xffe9a0c4, val: 42

Foo_ctor, 0xffe9a0c0, val: 42

Foo_dtor, 0xffe9a0c0, val: 43

Foo_dtor, 0xffe9a0c4, val: 43

Foo_dtor, 0xffe9a0c8, val: 43

In order to make that a reality, we need to follow our original plan of introducing a ‘registry’ of all constructed variables per scope. For one, our constructor will need to know of such a registry. But more importantly, we need to decide what to keep in that registry. Clearly, we need an association between destructors and variable addresses, or in other words, we need destructor closures. We devise such a structure (‘term_t’ for ‘terminal’):

typedef struct {
   void *self;
   void (*dtor)(void*);
}  term_t;

typedef struct {

void *self;

void (*dtor)(void*);

} term_t;

Now, a registry of those constitutes some kind of a storage, and what is more natural in C than an array? We modify our constructor accordingly:

size_t
Foo_ctor(Foo* self, term_t *restrict list_dtor, const size_t count) {
    assert(self);
    assert(list_dtor);

    // init some
    self->val = 42;
    fprintf(stderr, "%s, %p, val: %d\n", __FUNCTION__, self, self->val);

    // ctor concludes with registering the dtor closure into the supplied array
    return insert_dtor(list_dtor, count, self, Foo_dtor);
}

size_t

Foo_ctor(Foo* self, term_t *restrict list_dtor, const size_t count) {

assert(self);

assert(list_dtor);

// init some

self->val = 42;

fprintf(stderr, "%s, %p, val: %d\n", __FUNCTION__, self, self->val);

// ctor concludes with registering the dtor closure into the supplied array

return insert_dtor(list_dtor, count, self, Foo_dtor);

}

Now, we clearly did something extra here we also supplied the registry size as an arg to the constructor, which size we subsequently pass down to the tail-call that does the actual job:

size_t
insert_dtor(term_t *restrict list_dtor, const size_t count, void *self, void (*dtor)(void*)) {
    assert(self);
    assert(dtor);
    assert(count < LIST_DTOR_MAX_LEN);

    term_t *ip = list_dtor + count;
    ip->self = self;
    ip->dtor = dtor;
    return count + 1;
}

size_t

insert_dtor(term_t *restrict list_dtor, const size_t count, void *self, void (*dtor)(void*)) {

assert(self);

assert(dtor);

assert(count < LIST_DTOR_MAX_LEN);

term_t *ip = list_dtor + count;

ip->self = self;

ip->dtor = dtor;

return count + 1;

}

At this point we have a constructor and a destructor that can work with a registry of destructor closures, if we had any. So let’s provide some, shall we?

int main(int argc, char** argv) {
    term_t list_dtor_0[LIST_DTOR_MAX_LEN] = {};
    size_t list_dtor_0_count = 0;

    Foo f;
    list_dtor_0_count = Foo_ctor(&f, list_dtor_0, list_dtor_0_count); // explicit ctor

    if (argc > 1) {
        term_t list_dtor_1[LIST_DTOR_MAX_LEN] = {};
        size_t list_dtor_1_count = 0;

        Foo g;
        list_dtor_1_count = Foo_ctor(&g, list_dtor_1, list_dtor_1_count); // explicit ctor
        Foo h;
        list_dtor_1_count = Foo_ctor(&h, list_dtor_1, list_dtor_1_count); // explicit ctor


        if (atoi(argv[1]) == 42) {
            return EXIT_FAILURE;
        }
    }

    Foo g;
    list_dtor_0_count = Foo_ctor(&g, list_dtor_0, list_dtor_0_count); // explicit ctor

    return EXIT_SUCCESS;
}

int main(int argc, char** argv) {

term_t list_dtor_0[LIST_DTOR_MAX_LEN] = {};

size_t list_dtor_0_count = 0;

Foo f;

list_dtor_0_count = Foo_ctor(&f, list_dtor_0, list_dtor_0_count); // explicit ctor

if (argc > 1) {

term_t list_dtor_1[LIST_DTOR_MAX_LEN] = {};

size_t list_dtor_1_count = 0;

Foo g;

list_dtor_1_count = Foo_ctor(&g, list_dtor_1, list_dtor_1_count); // explicit ctor

Foo h;

list_dtor_1_count = Foo_ctor(&h, list_dtor_1, list_dtor_1_count); // explicit ctor

if (atoi(argv[1]) == 42) {

return EXIT_FAILURE;

}

Foo g;

list_dtor_0_count = Foo_ctor(&g, list_dtor_0, list_dtor_0_count); // explicit ctor

return EXIT_SUCCESS;

}

Finally, we need something to act upon those destructor-closure registries:

void
term_scope(term_t *restrict list_dtor, const size_t count) {
    term_t *ip = list_dtor + count - 1;
    for (size_t i = 0; i < count; --ip, ++i) {
        assert(ip->dtor);
        ip->dtor(ip->self);
    }
}

void

term_scope(term_t *restrict list_dtor, const size_t count) {

term_t *ip = list_dtor + count - 1;

for (size_t i = 0; i < count; --ip, ++i) {

assert(ip->dtor);

ip->dtor(ip->self);

}

Of course, since we are discussing C here, nobody will call our ‘term_scope’ function unless we place it accordingly ourselves:

int main(int argc, char** argv) {
    term_t list_dtor_0[LIST_DTOR_MAX_LEN] = {};
    size_t list_dtor_0_count = 0;

    Foo f;
    list_dtor_0_count = Foo_ctor(&f, list_dtor_0, list_dtor_0_count); // explicit ctor

    if (argc > 1) {
        term_t list_dtor_1[LIST_DTOR_MAX_LEN] = {};
        size_t list_dtor_1_count = 0;

        Foo g;
        list_dtor_1_count = Foo_ctor(&g, list_dtor_1, list_dtor_1_count); // explicit ctor
        Foo h;
        list_dtor_1_count = Foo_ctor(&h, list_dtor_1, list_dtor_1_count); // explicit ctor
        
        if (atoi(argv[1]) == 42) {
            term_scope(list_dtor_1, list_dtor_1_count); // explicit scope terminator(1)
            term_scope(list_dtor_0, list_dtor_0_count); // explicit scope terminator(0)
            return EXIT_FAILURE;
        }

        term_scope(list_dtor_1, list_dtor_1_count); // explicit scope terminator
    }

    Foo g;
    list_dtor_0_count = Foo_ctor(&g, list_dtor_0, list_dtor_0_count); // explicit ctor

    term_scope(list_dtor_0, list_dtor_0_count); // explicit scope terminator
    return EXIT_SUCCESS;
}

int main(int argc, char** argv) {

term_t list_dtor_0[LIST_DTOR_MAX_LEN] = {};

size_t list_dtor_0_count = 0;

Foo f;

list_dtor_0_count = Foo_ctor(&f, list_dtor_0, list_dtor_0_count); // explicit ctor

if (argc > 1) {

term_t list_dtor_1[LIST_DTOR_MAX_LEN] = {};

size_t list_dtor_1_count = 0;

Foo g;

list_dtor_1_count = Foo_ctor(&g, list_dtor_1, list_dtor_1_count); // explicit ctor

Foo h;

list_dtor_1_count = Foo_ctor(&h, list_dtor_1, list_dtor_1_count); // explicit ctor

if (atoi(argv[1]) == 42) {

term_scope(list_dtor_1, list_dtor_1_count); // explicit scope terminator(1)

term_scope(list_dtor_0, list_dtor_0_count); // explicit scope terminator(0)

return EXIT_FAILURE;

}

term_scope(list_dtor_1, list_dtor_1_count); // explicit scope terminator

}

Foo g;

list_dtor_0_count = Foo_ctor(&g, list_dtor_0, list_dtor_0_count); // explicit ctor

term_scope(list_dtor_0, list_dtor_0_count); // explicit scope terminator

return EXIT_SUCCESS;

}

Voila, we have ensured the desired behavior from our test scenarios above. But now follows the ‘billion-dollar’ question: is that mechanism zero-overhead at runtime? It clearly does ‘stuff’ at runtime, so can a compiler optimize that out via constant folding?

Long story short it could, under a couple of pre-conditions. I will spare you all the versions of the code which do not work (i.e. will never get properly optimized by the compiler) you can experiment and find those for yourself. I will just mention that for the presented scheme to be successfully constant-folded, the compiler needs to be certain of the sizes of the term_t arrays, ergo there are two things we have to ensure:

The constructor (Foo_ctor) and the registrar function (insert_dtor) have to be inlined
The registry-size term ‘count’ needs to be a stand-alone variable placing that in any kind of structure breaks the magic.

Ok, back to the code at hand. It does look somewhat unwieldy, so let’s sprinkle some macro beautifiers over it:

#define begin_scope(level) \
    term_t list_dtor_ ## level[LIST_DTOR_MAX_LEN] = {}; \
    size_t list_dtor_ ## level ## _count = 0;

#define end_scope(level) \
    term_scope(list_dtor_ ## level, list_dtor_ ## level ## _count);

#define construct(level, var, ctor) \
    list_dtor_ ## level ## _count = ctor(&var, list_dtor_ ## level, list_dtor_ ## level ## _count);

#define begin_scope(level) \

term_t list_dtor_ ## level[LIST_DTOR_MAX_LEN] = {}; \

size_t list_dtor_ ## level ## _count = 0;

#define end_scope(level) \

term_scope(list_dtor_ ## level, list_dtor_ ## level ## _count);

#define construct(level, var, ctor) \

list_dtor_ ## level ## _count = ctor(&var, list_dtor_ ## level, list_dtor_ ## level ## _count);

Those would allow us to write our example in the following manner:

int main(int argc, char** argv) {
    begin_scope(0);

    Foo f;
    construct(0, f, Foo_ctor);

    if (argc > 1) {
        begin_scope(1);

        Foo g;
        construct(1, g, Foo_ctor);
        Foo h;
        construct(1, h, Foo_ctor);

        if (atoi(argv[1]) == 42) {
            end_scope(1)
            end_scope(0)
            return EXIT_FAILURE;
        }

        end_scope(1);
    }

    Foo g;
    construct(0, g, Foo_ctor);

    end_scope(0);
    return EXIT_SUCCESS;
}

int main(int argc, char** argv) {

begin_scope(0);

Foo f;

construct(0, f, Foo_ctor);

if (argc > 1) {

begin_scope(1);

Foo g;

construct(1, g, Foo_ctor);

Foo h;

construct(1, h, Foo_ctor);

if (atoi(argv[1]) == 42) {

end_scope(1)

end_scope(0)

return EXIT_FAILURE;

}

end_scope(1);

}

Foo g;

construct(0, g, Foo_ctor);

end_scope(0);

return EXIT_SUCCESS;

}

Not so bad now, eh? You can get all discussed code from here.

As a final note, this technique works to a zero-overhead result in gcc and clang. Other popular compilers might or might not cnstant-fold our expressions for instance Intel’s icc and Microsoft’s msvc fail as of their current versions. Hopefully their constant-folding optimization abilities improve in the future.

Jean-Luc Aufranc (CNXSoft)

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.

Share this:

Support CNX Software! Donate via cryptocurrencies, become a Patron on Patreon, or purchase goods on Amazon or Aliexpress

Name*

Email*

Website

I agree to the Privacy Policy

The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.

18 Comments

oldest

newest

Vbextreme

5 years ago

GCC have raii for C, set function to call at end of scope with attribute cleanup.

Generic astraction
https://github.com/vbextreme/vbar/blob/master/include/vbar/type.h#L39

File Abstraction
https://github.com/vbextreme/vbar/blob/master/include/vbar/file.h#L21

Implementation
https://github.com/vbextreme/vbar/blob/master/core/file.c#L66

File autoclose at end of scope
https://github.com/vbextreme/vbar/blob/master/modules/M_cpu.c#L25

blu

That’s quite convenient and news to me. Thanks for the heads-up!

js0x0

Compiler extensions are great if you don’t care about portability.
Like if you’re developing a library that’s only ever going to be compiled with GCC, or something embedded (where there’s so much implementation defined behaviour anyway).

KR_

If you care about C portability you’re using either GCC or Clang and both support this extension: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization#Clang_and_GCC_%22cleanup%22_extension_for_C

You won’t use MSVC since it’s not even C99 compatible and you won’t use ICC since it’s no better than GCC at C as opposed to C++ where it only has a few marginal performance benefits when targeting Intel x86 hardware (and not AMD’s).

Diego Sueiro

This is a very interesting approach.
Is there any “real life” example using it?
I’m specially interested in microcontroller RTOS applications.

To the ‘real-life’ part of your question, I’ve seen similar approaches in C before, I just can’t remember in what projects, but I guess that indicates there are real-life applications. I just don’t think that any of those have been designed for zero-overhead.

There are other garbage collectors, but they tend to be either general purpose (therefore large and slow) or application specific (and therefore buried in someone’s codebase and difficult to discover).

There’s an omission in the embedded sample code — all ‘list_dtor_#’ declarations should be of [LIST_DTOR_MAX_LEN] elements (it’s an array, after all). Code in the repo is corrected. My fault.

Matlo

What about using a linked list?

I haven’t tried linked lists, but judging by how easy it is to confuse the compiler and lose the constant-folding optimisation, my guess would be that linked lists would preclude the discussed optimsiation. But if you’d want to use linked lists for size considerations — that array size in the current implementation can be arbitrary — the closure arrays get optimized out anyway.

Gégé

I do not agree with your definition of “constructor” and “destructor”: since you call them manually, it’s just a “initialize” and “finalize” function.
If you want real {con,de}structor behaviour, you can use special attributes to your function. For example, the GNU C Compiler implement both constructor and destructor attribute. I guess that other compiler implement such feature too…
The big advantage of using these is that, in case of library, you enforce the call of your functions.
Unfortunately, you can not pass special arguments like you do in these kind of {con,de}structor.

https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/Function-Attributes.html

gcc constructor and destructor function attributes do something else — they provide invocation before and after main. The code in this article does constructors and destructors at end of scope.

sergey kaydalov

For GCC user there is another way to call a destructor function. Each variable can have a cleanup attribute and a corresponding handler which is called when the variable goes out of scope. Not a zero-overhead, but you will never forget calling a destructor. https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes

SomeDumbName37

4 years ago

This is great, thanks. I used this to make some simple generic code: #include #include #include typedef struct { char * str; } foo; void foo_init( foo * f ) { printf(“foo init\n”); f->str = NULL; } void foo_cleanup( foo * f ) { printf(“foo cleanup\n”); if( f->str ) free( f->str ); } #define DTOR(t,v) t v __attribute__ ((cleanup (t##_cleanup))); t##_init(&v) int main( int ac, char * av[] ) { DTOR(foo, x); x.str = strdup(“normal exit”); if( ac > 1 && !strcmp( av[1], “early”)) { printf(“exit early\n”); return 1; } else if( ac > 1 && !strcmp( av[1], “goto”)) {… Read more »

T-C

So glad I only do PHP

You’re missing a great deal of fun! : )

edrrwwf

gcc -o zod ./zod.c ./zod.c: In function ‘Foo_ctor’: ./zod.c:76:45: warning: passing argument 4 of ‘insert_dtor’ from incompatible pointer type [-Wincompatible-pointer-types] return insert_dtor(list_dtor, count, self, Foo_dtor); ^~~~~~~~ ./zod.c:22:1: note: expected ‘void (*)(void *)’ but argument is of type ‘void (*)(Foo *) {aka void (*)(struct *)}’ insert_dtor(term_t *restrict list_dtor, const size_t count, void *self, void (*dtor)(void*)) { ^~~~~~~~~~~ At top level: ./zod.c:67:1: warning: always_inline function might not be inlinable [-Wattributes] Foo_ctor(Foo* self, term_t *restrict list_dtor, const size_t count) { ^~~~~~~~ ./zod.c:22:1: warning: always_inline function might not be inlinable [-Wattributes] insert_dtor(term_t *restrict list_dtor, const size_t count, void *self, void (*dtor)(void*)) { ^~~~~~~~~~~ ./zod.c:13:1:… Read more »

erfr me

this not working for me
assert not showing anything
(macOSX, and arm6)