two data sets separation

#1
Hi there,
I need some help with finding a separation value between two data sets.
I am trying to identify burned areas based on the values of pixels in Near Infra Red (NIR) and Short Wave Infra Red (SWIR) values registered before and after a forestry fire. There is a drop in NIR values and a rise in SWIR values between date 1 and date 2 when a forestry fire happens and a burned areas appears on the satellite image. This drop is caracterized through a index calculation (Relative Difference Normalized Burn Ratio - RDNBR). I want to find a limit between the RDNBR values of a burned area and the RDNBR values of a non-burned area in order to calibrate my model and to find all the burned areas in a large set of satellite imagery.
I visually identified two sets of pixels in my images: the first set (conIncendios) corresponds to pixels that where covered with forests in date1 and burned areas in date 2; the second set (sinIncendios) corresponds to pixels of forest in date1 and date2 (no change).
The result i get is two sets with different values but there is an overlapping between the two sets as show in this boxplot picture (first quantil of first set overlaps last quantil of second set:
1599677978763.png
My question is: Is there a way of calculating the best separation value between those two sets ?
I hope I didn´t forget anything useful in this explanation...
Thanks in advance,
 

Attachments

hlsmith

Less is more. Stay pure. Stay poor.
#2
Look at using receiver operating characteristic curve. It will plot the sensitivity and specificity values for all cut-offs. Once you have these values you can use different approaches to make the split (e.g., Youden index balances false positives and false negatives rates or you can use a cost matrix to penalize what types of errors you would find less desirable. There are other approaches such as entropy and gini statistics, but I prefer the visual nature of the ROC.

P.S., Besides using the above boxplots, I would also plot them as overlaid histograms.
 
#3
Look at using receiver operating characteristic curve. It will plot the sensitivity and specificity values for all cut-offs. Once you have these values you can use different approaches to make the split (e.g., Youden index balances false positives and false negatives rates or you can use a cost matrix to penalize what types of errors you would find less desirable. There are other approaches such as entropy and gini statistics, but I prefer the visual nature of the ROC.

P.S., Besides using the above boxplots, I would also plot them as overlaid histograms.
Hi hlsmith, many thanks! I´ll dig into it and let the community know..