Recentering data or creating an index based on another variable

#1
hello all. I am trying to recenter some data that I have to run some regression discontinuity analysis. I have a data set that has variables on the make, model, model year, and percent of production with airbags and I am trying to create an index based on when airbags became standard (i.e. 100% of production had airbags). I have the following but I am running into some errors:

Code:
function(){
	
	for (i in 1:length(levels(cleanData$Model)){
		
		y <- cleanData[cleanData$Model %in% levels(cleanData$Model)[i],]
		
		z <- subset(y, Airbag.Driver. == 100)
		
		w <- min(z$ModelYear)
		
		#if (cleanData$Model == get(x[i])){
			
		#	cleanData$center <- cleanData$ModelYear - w
			
		#	}
		
		}
	
	
	
	
	}
Just running through the first section (before the commented out if statement), I am running into some warnings that:

>In min(z$ModelYear) : no non-missing arguments to min; returning Inf

I'm guessing that means that there are some models which don't ever have 100% airbag implementation but I'm not sure how to make the program just ignore those (or leave them coded as 99 or something).

After that, I want to make a new variable (already in the cleanData dataset as cleanData$center) that makes the year that airbags became standard for that model 0 and the year before -1, the year after 1, etc. (e.g. ford taurus airbags became standard in 1990, so 1988 => -2, 1989 => -1, 1990=> 0, 1991 => 1, etc). Any help with this would be greatly appreciated.

Thanks!
 

bryangoodrich

Probably A Mammal
#2
For centering, you might want to check out the scale function.

Code:
c(scale(1988:1994, center = 1990, FALSE))
# [1] -2 -1  0  1  2  3  4
I use c(...) because it returns it as a vector instead of a matrix.

As for min returning Inf, I assume there's something weird about the data. Run

Code:
str(cleanData)
The ModelYear must be numeric for min to be meaningful.

Also, your code, to me, would look better this way

Code:
	for (i in seq(levels(cleanData$Model)) {		
		y <- cleanData[cleanData$Model == levels(cleanData$Model)[i], ]
There's no reason to use %in% when you can only have 1 level you're referencing with that 'i' integer. Therefore, do an equality comparison. If you're doing sequences from 1 to something, just use seq. I'd also probably just store levels(cleanData$Model) in its own object because you're making references to it so much.
 
#3
Thanks for the help! I will check out the scale function but as for the other problems, cleanData$ModelYear is an integer variable, so it should still work for min() (I think the problem is arising from the fact that some car models do not ever reach Airbag.Driver. == 100, so not every model is included). Not sure though. I still think min should work because it works if I use it on individual models (like the taurus).

Thanks for the help cleaning up the for statement though!
 
Last edited:

bryangoodrich

Probably A Mammal
#4
I'm still unclear about the warning and why it would come out. If you have values which should be empty, set them to NA.

Code:
cleanData[someBooleanToReferenceNARecords, 'ModelYear'] <- NA
 
#5
There are no empty cells in the whole data set (I've already cleaned it) but there are some car models (cleanData$Model) which never fully implement airbags, so that the subset line:

Code:
z <- subset(y, Airbag.Driver. == 100)
returns an empty data set. I was thinking that this may be the source of the problem. (e.g. the ford *** never has an airbag in my data set, but there are several years of observation, so this would return an empty subset with the above code)
 

bryangoodrich

Probably A Mammal
#6
Okay, I get it. So when you go through the loop you'll come to an empty subset you're trying to run min on. I replicated your result.

Code:
> df <- data.frame(A = rnorm(10), B = gl(5,2))
> df
             A B
1   1.50612055 1
2  -0.25453457 1
3  -0.01874759 2
4  -2.50998664 2
5   0.31753606 3
6   0.65032532 3
7  -1.13789731 4
8   0.17341094 4
9   2.33955179 5
10 -0.58415921 5
> df[df$B == '6', ]
[1] A B
<0 rows> (or 0-length row.names)
> min(df[df$B == '6', 1])
[1] Inf
Warning message:
In min(df[df$B == "6", 1]) : no non-missing arguments to min; returning Inf
 

bryangoodrich

Probably A Mammal
#8
There is no "fix" because you're trying to run an operation on something that doesn't exist! Instead, avoid the operation.

Code:
if (length(y[[1]]) == 0)  {  # Does not contain any records
  w <- NA
} else
  w <- min(...)
 
#9
I'm still running into the same warning message. (Sorry for being a bit inept). This is what I've got now:

Code:
function(){
	
	indexedData <- matrix(0, 1, dim(cleanData)[2])
	
	colnames(indexedData) <- c("ModelYear", "Make", "Model", "Production", "Airbag.Driver.", "center")
	
	for (i in seq(levels(cleanData$Model))) {		

		y <- cleanData[cleanData$Model == levels(cleanData$Model)[i], ]
	
		z <- subset(y, Airbag.Driver. == 100)
		
		#w <- min(z$ModelYear)
		
		if (length(z) == 0)  {  # Does not contain any records
			
						  w <- NA
						  
							} else {
							
 						 w <- min(z$ModelYear)
							
							}
		
		y$center <- c(scale(y$ModelYear, center = w, FALSE))
		
		indexedData <- rbind(indexedData, y)
		
		#if (cleanData$Model == get(x[i])){
			
		#	cleanData$center <- cleanData$ModelYear - w
			
		#	}
		
		}
	
	indexedData <<- indexedData
	
	
	}
Still returning the same

>In min(z$ModelYear) : no non-missing arguments to min; returning Inf

warning message.
 

Dason

Ambassador to the humans
#11
My guess is if it's a data frame you'll just want to replace length with nrow or dim(whatever)[1] or something similar.
 
#12
I think I may have gotten it to work. Here's what I've got:

Code:
function(){
	
	indexedData <- matrix(0, 1, dim(cleanData)[2])
	
	colnames(indexedData) <- c("ModelYear", "Make", "Model", "Production", "Airbag.Driver.", "center")
	
	for (i in seq(levels(cleanData$Model))) {		
		y <- cleanData[cleanData$Model == levels(cleanData$Model)[i], ]
	
		z <- subset(y, Airbag.Driver. == 100)
		
		if (dim(z)[1] == 0)  {  # Does not contain any records
			
				w <- 0
						  
							} else {
							
 				w <- min(z$ModelYear)
							
							}
							
		if (w >= 1984 && w <= 1994){
		
			y$center <- c(scale(y$ModelYear, center = w, FALSE))
		
				} else {
				
			y$center <- matrix(99, dim(y)[1], 1)
				
				}
		
		indexedData <- rbind(indexedData, y)
			
		
		}
	
	indexedData <<- indexedData
	
	
	}
Thank you for all of your help!
 

Dason

Ambassador to the humans
#13
You could probably replace that if/else construct with this too:
Code:
if(!is.finite(suppressWarnings(w <- min(z$ModelYear)))){w <- 0}
#or
w <- ifelse(is.finite(suppressWarnings(w <- min())), w, 0)
Then again the if/else is probably more clear as to what is going on and probably runs faster too. But I had fun making it.

But really I'm wondering why you need to do this in the first place? You're just getting a warning (it's not actually an error) and it returns Inf so w will contain Inf and it looks like that isn't going to mess up any of your other checks later. So if it's just the warning that is bugging you you could just do:

Code:
w <- suppressWarnings(min(z$ModelYear))