How to implement data structures correctly in C (mainly about pointers), as someone with a background in Java. E.g. creating a constructor correctly?

Suppose I have a "class" (struct) in C as follows:

typedef struct{
    int age;
}
Person;

I have seen various possibilities on how to implement e.g. a constructor function (but also links to other structs etc), and I am not sure which one I should be using and when. The first that comes to mind is:

Person newPerson(int age){
    Person person;
    person.age = age;
    return person;
}

Person person = newPerson(4);

This would work the same as it does in Java... But I often see something different online

Person* /*why?*/ newPerson(int age){
    Person person;
    person.age = age;
    return &person; //why?
}

Person* person = newPerson(4);
//also, why not just: Person person = *newPerson(4);

I know how to work with it, but I am not sure WHY I would return a pointer to the struct instead of just the struct itself. Some websites even go as far as to use malloc:

Person* newPerson(int age){
    Person* person = (Person*)malloc(sizeof(Person)); //why?
    person->age = age;
    return person;
}

Person* person = newPerson(4);

Again, I still know HOW to work with all of these other approaches, I just do not understand WHY I would want to do so? The first method seems easier to understand and implement, so there must be some other reasons why I would want to use method 2 or 3 (or similiar) instead? If this is the case, could someone explain some of these reasons?

Edit: If you know a website (or search term) that explains this well, I could also use a link to that website instead of an answer

Solution

why would I return a pointer rather than the struct itself?

Don't confuse a C struct with a Java object. A Java object is implicitly represented as a pointer even though you don't see that in the code, but a C struct is not: it's the entire contents of all the struct's fields. If you assign a returned struct to a variable, all the fields are copied. If you assign that variable to another variable of the same type, all the fields are copied, unlike Java, where both variables would refer to the same memory and same field instances.

As commented by @wohlstad, this option works, but causes Person to be copied on return:

Person newPerson(int age){
    Person person;
    person.age = age;
    return person;
}

Person person = newPerson(4);

Whether you want an "object" to be copied or not is similar in C and Java; if it's OK to copy the object then this is fine, especially for small structs. But often, you want the identity of the object to remain constant throughout its life, in which case you want to pass a pointer to the object rather than copy the object.

The second option is wrong and dangerous:

Person* newPerson(int age){
    Person person;
    person.age = age;
    return &person;
}

Person* person1 = newPerson(4);

It's wrong because person is allocated on the stack, which is popped on return from newPerson() and that stack space is reused for different purposes in subsequent code, such as when other functions are called. But meanwhile the variable person1 still points to that memory that's being reused for other purposes, and bad things will result.

The third option (removing the cast on malloc) is the canonical choice:

Person* newPerson(int age){
    Person* person = malloc(sizeof(Person));
    person->age = age;
    return person;
}

Person* person1 = newPerson(4);

malloc() allocates memory for the new object. In Java you don't need to do this, the language hides that detail from you, but C does not. This newly allocated memory remains allocated until you release it using free(), so in addition to newPerson() you'd want to define a deletePerson() function, and user code would need to call it explicitly.

To write well-structured code that provides type safety and hides details from calling code, you'd implement Person in one C module, with its associated include file defining the interface, and use C "incomplete types":

person.h:

struct person_s;  // Note: we could use "Person" instead of "person_s" here and below
typedef Person *person_s;

Person newPerson(int age);
int personAge(Person);
void deletePerson(Person);

person.c

#include "person.h"
struct person_s {
    int age;
}

// hopefully, the rest of the implementation code is obvious ...

User code:

#include "person.h"
...
void doPersonStuff() {
    person1 = newPerson(4);
    ... // do stuff with person1
    person2 = newPerson(6);
    ... // etc
    deletePerson(person1);
    deletePerson(person2);
}

struct person_s; is the incomplete type declaration: there's a struct named person_s, but for now, its contents are left unspecified.

Note that the user code doesn't have a clue what's in the struct and can't use sizeof(person_s). All the details are hidden. Many coders use void * for this, but that's a bad habit, as it's too easy to mix up different void * variables being used for different purposes. For example, if you have a person API and a group API and both return void * as the type, you could accidentally pass a person where a group is required, and the compiler won't detect it. Using incomplete types, the compiler would catch the error.