lasso issue

noetsi

No cake for spunky
#1
I circled back to lasso because I have 48 variables and that is simply to much in my humble opinion for my data (the whole data set has about 3200 cases, but in many cases I slice it to as few as 48). I am not using k fold validation because it does not work well with too few cases. Nor am I building a training data set for the same reason.

I always get the following error.

WARNING: The adaptive weights for the LASSO method are not uniquely determined because the full least squares model is singular.

I am not sure how to fix this problem.

Code:
proc glmselect data= work.setest;
CLASS 
"Age 25 to 44"n (ref ="0")
"Associate’s degree"n (ref ="0")
"Bachelor’s degree"n (ref ="0")
"Beyond a bachelor’s degree"n (ref ="0")
"High school diploma or equivalen"n (ref ="0")
/*"Individuals has a significant di"n (ref ="0")removed for SE analysis */
"Postsecondary education no degre"n (ref ="0")
"Race: Black"n (ref ="0")
"Race: More than one"n (ref ="0")
"Special education certicate/comp"n (ref ="0")
"Age 19 to 24"n (ref ="0")
"Age 45 to 54"n (ref ="0")
"Age 55 to 59"n (ref ="0")
"Age 60+"n (ref ="0")
'Age 16 to 18'n (ref ="0")
"Race: Asian"n (ref ="0")
"Race: Hawaiian/Pacific Islander"n (ref ="0")
"Race: White"n (ref ="0")
 "Foster care youth"n (ref ="0")
"Psychosocial and psychological d"n (ref ="0")
"Intellectual and learning disabi"n (ref ="0")
"Physical disability"n (ref ="0")
"Auditory and communicative disab"n (ref ="0")
Veteran (ref ="0")
"TANF recipient"n (ref ="0")
"Single parent"n (ref ="0")
/*"Received career services"n (ref ="0") */
/*"Received training services"n (ref ="0")*/
/*"Received other services"n (ref ="0")*/
"Received public support at appli"n (ref ="0")
"Employed at application"n (ref ="0")
"Homeless individual, runaway you"n (ref ="0")
"Low-income"n (ref ="0")
"Limited English-language profici"n (ref ="0")
"Migrant and seasonal farmworker"n (ref ="0")
"Long-term unemployed"n (ref ="0")
/* "Individuals is most significant"n (ref ="0")removed for SE analysis */
"Ethnicity-Hispanic Ethnicity"n (ref ="0")
"Ex-offender"n (ref ="0")
"Displaced homemaker"n (ref ="0")
Female (ref ="0")

    ;
    MODEL Qtr2_Wage=    
"Age 25 to 44"n 
"Associate’s degree"n 
"Bachelor’s degree"n
"Beyond a bachelor’s degree"n
"High school diploma or equivalen"n 
/*"Individuals has a significant di"n */
"Postsecondary education no degre"n 
"Race: Black"n 
"Race: More than one"n
"Special education certicate/comp"n
"Age 19 to 24"n 
"Age 45 to 54"n 
"Age 55 to 59"n
"Age 60+"n
'Age 16 to 18'n 
"Race: Asian"n 
"Race: Hawaiian/Pacific Islander"n 
"Race: White"n
"Foster care youth"n
"Psychosocial and psychological d"n 
"Intellectual and learning disabi"n
"Physical disability"n 
"Auditory and communicative disab"n 
Veteran
"TANF recipient"n
"Single parent"n 
/*"Received career services"n
"Received training services"n 
"Received other services"n */
"Received public support at appli"n
"Employed at application"n
"Homeless individual, runaway you"n 
"Low-income"n 
"Limited English-language profici"n
"Migrant and seasonal farmworker"n 
"Long-term unemployed"n 
/*"Individuals is most significant"n */
"Ethnicity-Hispanic Ethnicity"n 
"Ex-offender"n 
"Displaced homemaker"n
Female 
"Construction Employment"n 
"Educational, or Health Care Rela"n 
"Financial Services Employment"n
"Information Services Employment"n
"Leisure, Hospitality, or Enterta"n
"Natural Resources Employment"n 
"Other Services Employment"n 
"Trade and Transportation Employm"n 
"Professional and Business Servic"n 
"Manufacturing Related Employment"n
"totalgovernment"n
 
/ selection=lasso(adaptive choose=sbc stop=none);

run;
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
We should figure out what you are getting the error, but what happens when you drop the adaptive option from the code. Adaptive is just providing empiric starting values.
 

noetsi

No cake for spunky
#3
I will try that although this has not mattered much in the past. I did Adaptive LASSO because some suggest it has theoretical advantage in choosing the right variables (the most important ones) as compared to LASSO.
 

noetsi

No cake for spunky
#4
Hopefully this is the right code. When I do it the error goes away.

/ selection=lasso(choose=sbc stop=none);
 

noetsi

No cake for spunky
#5
Ok I ran the model without getting a warning. So the model prints out this:



[SAS prints this out every time I run Lasso in the results, I don't know if its a warning or not]
Selection stopped because all candidate effects for entry are linearly dependent on effects in the model.
[then this]
The selected model, based on SBC, is the model at Step 8.

Then it list a series of variables which I assume is the one it chooses. The problem I have is that it chooses some elements of other categorical variable notably education, but not others. When I run the model do I just reclassify the other factors into the intercept or use the full list of dummy variables that make up education. It selected most of them, but left out a few. Also there are ten predictors of sector, it chose only one. Is it best to leave that in or because the other 9 are excluded exclude that. I don't have much experience with lasso and none of the articles I read address this.

1635892034538.png
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
Hmm, what does the web say about "entry are linearly dependent on effects"? Is the DV or IVs full explained by the terms. So if I put BMI and then height and weight in the model, BMI is fully explained by the other terms and SAS should yell at me.
 

noetsi

No cake for spunky
#7
This shows up every single time I run lasso on any data set regardless of the variables. And last time I did a test for multicolinearity. My VIF was fine. I think this always shows up in lasso.

If it was what you say it is hlsmith than it should also show up in genmod because I use the same variables. But I never get that warning with genmod.

I found a number of articles on line that have it in their output. None address it so I assume it is a default.

Does this not show up for you hlsmith when you run lasso? That is this comment.

27.glmselect.example.pdf (usu.edu)

How do you run it, what is your command line indicating the use of lasso. I chose one that someone used, but I do not know the best approach. K fold validation failed for me before although I have more data.
 
Last edited:

noetsi

No cake for spunky
#9
That's not a linear relationship though so it won't cause any issues directly.
I actually don't believe this is the problem. I run the same data with genmod and do not get any error. This appears to always show up in SAS when you run lasso, although I am not sure since I can not find documentation including searches on lone. Analysis have the sentence in it, but they never discuss what it means. That is why I assume it is automatic.

Dason when you do lasso do you use kfold validation, sbc, aic....what is the selection criteria (I know you do this in R not Sas
 

Dason

Ambassador to the humans
#11
If you did the logs of BMI, weight, and height then you should end up with an issue. But this is just off the top of my head without any formal checking. I guess we'll see how good my "off the top of my head" math is these days.
 

hlsmith

Less is more. Stay pure. Stay poor.
#13
OK. I simulated a dataset and created BMI, Weight, Height, stupid (=height + weight) and an independent random variable labelled y. The software did not complain when I used any of the the following models:

y = BMI + Height + Weight
BMI = Height + Weight
Stupid = Height + Weight

Used versions of the following code, so not sure why your error is happening, not an issue related to this.

Code:
%let N=100;
data wt;
    call streaminit(1);
    do i = 1 to &N;
        wt = rand("Normal", 125, 5);
        y =  rand("Normal");
    output;
end;
run;
data ht;
    call streaminit(1);
    do i = 1 to &N;
        ht = rand("Normal", 63, 3);
    output;
end;
run;
Proc sort data=wt;
    by wt; run;
proc sort data=ht;
    by ht; run;
data TS;
    merge wt  ht;
    height = ht/39.37;
    weight = wt/ 2.205;
    BMI = weight/(height**2);
    Stupid = Height + Weight;
run;

proc glmselect data=TS;
model Stupid = weight height  / selection=lasso(adaptive choose=sbc stop=none);
run;