How to deal with these variables in a classification model?

#1
Hello everyone,

I am looking at how to sanitize this dataset and prepare it for a classification model (logistic regression, decision trees, etc.) . I am just having a problem deciding how to deal with a few variables, which are listed below.

Date_account_created: mm/dd/yyyy format, should I bucket these into the four seasons? Or should i do an elapsed time, where the first date is a 0 and then every day after counts by 1.

timestamp: yyyymmddhhmmss, should I cut out the hours minutes seconds?

Date_first_booking: mm/dd/yyyy, same issue as the first but also 58% of the values are missing, should I bucket the missing values? Impute them?

Gender: there is male, female, and other. There are missing values, should i bucket them with "other"? Their own bucket? or something else?

Age: impute missing values? or bucket them?


I hope you guys can help point me in the right direction! Thanks.