clean string variable using other string variables

#1
Hello Friends,

Do you know any good book for string command in stata? I am having trouble understanding the logic of the string commands.

I have a dataset of peer-review articles. One variable is the article's study design. My goal is just to have just 3 categories Case-Control, Cohort, and Review in the design variable.

Example of my design variable:

Regression Analyses Were Used –cohort
A Case -Control Study Using Logistic Re
A Case Control Study
A Hospital Based Case Control Study In
A Meta Analysis Of The Results From Epi
A Population Based Cohort Study Using
A Population-Based Case Control Family
Analyzed The Global Gene Expression Pro
Baseline Self-Administered Questionaire
Case Control Study
Case Series
Case- CONTROL
Case-Control
Case-Control Study
Case-Controlled Study
Case/Control
Cohort

So I want to find case-control or maybe just case in any on the sentence and return Case-Control, and to the same for the other categories Cohort and review. We should take care of - and spaces, upper, lower case etc.

Also, I want to use the information I have in other variables such as in the "title" variable to clean/fix my design variable. Some title have the study design in it. For instance, "Breast cancer risk factors in Korean women: a literature review". If i find literature review anywhere in the title then return "Review". Please take into account that I have many missing values in my Design variable. That is why I am trying too fix it base on other variables.

I am attaching mu dataset. thank you very much.,. please help.