# Recentering data or creating an index based on another variable

#### cruzeconomics

##### New Member
hello all. I am trying to recenter some data that I have to run some regression discontinuity analysis. I have a data set that has variables on the make, model, model year, and percent of production with airbags and I am trying to create an index based on when airbags became standard (i.e. 100% of production had airbags). I have the following but I am running into some errors:

Code:
function(){

for (i in 1:length(levels(cleanData$Model)){ y <- cleanData[cleanData$Model %in% levels(cleanData$Model)[i],] z <- subset(y, Airbag.Driver. == 100) w <- min(z$ModelYear)

#if (cleanData$Model == get(x[i])){ # cleanData$center <- cleanData$ModelYear - w # } } } Just running through the first section (before the commented out if statement), I am running into some warnings that: >In min(z$ModelYear) : no non-missing arguments to min; returning Inf

I'm guessing that means that there are some models which don't ever have 100% airbag implementation but I'm not sure how to make the program just ignore those (or leave them coded as 99 or something).

After that, I want to make a new variable (already in the cleanData dataset as cleanData$center) that makes the year that airbags became standard for that model 0 and the year before -1, the year after 1, etc. (e.g. ford taurus airbags became standard in 1990, so 1988 => -2, 1989 => -1, 1990=> 0, 1991 => 1, etc). Any help with this would be greatly appreciated. Thanks! #### bryangoodrich ##### Probably A Mammal For centering, you might want to check out the scale function. Code: c(scale(1988:1994, center = 1990, FALSE)) # [1] -2 -1 0 1 2 3 4 I use c(...) because it returns it as a vector instead of a matrix. As for min returning Inf, I assume there's something weird about the data. Run Code: str(cleanData) The ModelYear must be numeric for min to be meaningful. Also, your code, to me, would look better this way Code: for (i in seq(levels(cleanData$Model)) {
y <- cleanData[cleanData$Model == levels(cleanData$Model)[i], ]
There's no reason to use %in% when you can only have 1 level you're referencing with that 'i' integer. Therefore, do an equality comparison. If you're doing sequences from 1 to something, just use seq. I'd also probably just store levels(cleanData$Model) in its own object because you're making references to it so much. #### cruzeconomics ##### New Member Thanks for the help! I will check out the scale function but as for the other problems, cleanData$ModelYear is an integer variable, so it should still work for min() (I think the problem is arising from the fact that some car models do not ever reach Airbag.Driver. == 100, so not every model is included). Not sure though. I still think min should work because it works if I use it on individual models (like the taurus).

Thanks for the help cleaning up the for statement though!

Last edited:

#### bryangoodrich

##### Probably A Mammal
I'm still unclear about the warning and why it would come out. If you have values which should be empty, set them to NA.

Code:
cleanData[someBooleanToReferenceNARecords, 'ModelYear'] <- NA

#### cruzeconomics

There are no empty cells in the whole data set (I've already cleaned it) but there are some car models (cleanData$Model) which never fully implement airbags, so that the subset line: Code: z <- subset(y, Airbag.Driver. == 100) returns an empty data set. I was thinking that this may be the source of the problem. (e.g. the ford *** never has an airbag in my data set, but there are several years of observation, so this would return an empty subset with the above code) #### bryangoodrich ##### Probably A Mammal Okay, I get it. So when you go through the loop you'll come to an empty subset you're trying to run min on. I replicated your result. Code: > df <- data.frame(A = rnorm(10), B = gl(5,2)) > df A B 1 1.50612055 1 2 -0.25453457 1 3 -0.01874759 2 4 -2.50998664 2 5 0.31753606 3 6 0.65032532 3 7 -1.13789731 4 8 0.17341094 4 9 2.33955179 5 10 -0.58415921 5 > df[df$B == '6', ]
[1] A B
<0 rows> (or 0-length row.names)
> min(df[df$B == '6', 1]) [1] Inf Warning message: In min(df[df$B == "6", 1]) : no non-missing arguments to min; returning Inf

#### cruzeconomics

##### New Member
That's my problem exactly. Any ideas on how to fix it?

#### bryangoodrich

##### Probably A Mammal
There is no "fix" because you're trying to run an operation on something that doesn't exist! Instead, avoid the operation.

Code:
if (length(y[[1]]) == 0)  {  # Does not contain any records
w <- NA
} else
w <- min(...)

#### cruzeconomics

##### New Member
I'm still running into the same warning message. (Sorry for being a bit inept). This is what I've got now:

Code:
function(){

indexedData <- matrix(0, 1, dim(cleanData)[2])

colnames(indexedData) <- c("ModelYear", "Make", "Model", "Production", "Airbag.Driver.", "center")

for (i in seq(levels(cleanData$Model))) { y <- cleanData[cleanData$Model == levels(cleanData$Model)[i], ] z <- subset(y, Airbag.Driver. == 100) #w <- min(z$ModelYear)

if (length(z) == 0)  {  # Does not contain any records

w <- NA

} else {

w <- min(z$ModelYear) } y$center <- c(scale(y$ModelYear, center = w, FALSE)) indexedData <- rbind(indexedData, y) #if (cleanData$Model == get(x[i])){

#	cleanData$center <- cleanData$ModelYear - w

#	}

}

indexedData <<- indexedData

}
Still returning the same

>In min(z$ModelYear) : no non-missing arguments to min; returning Inf warning message. #### bryangoodrich ##### Probably A Mammal What type of object is z? How can you determine the number of observations it contains? #### Dason ##### Ambassador to the humans My guess is if it's a data frame you'll just want to replace length with nrow or dim(whatever)[1] or something similar. #### cruzeconomics ##### New Member I think I may have gotten it to work. Here's what I've got: Code: function(){ indexedData <- matrix(0, 1, dim(cleanData)[2]) colnames(indexedData) <- c("ModelYear", "Make", "Model", "Production", "Airbag.Driver.", "center") for (i in seq(levels(cleanData$Model))) {
y <- cleanData[cleanData$Model == levels(cleanData$Model)[i], ]

z <- subset(y, Airbag.Driver. == 100)

if (dim(z)[1] == 0)  {  # Does not contain any records

w <- 0

} else {

w <- min(z$ModelYear) } if (w >= 1984 && w <= 1994){ y$center <- c(scale(y$ModelYear, center = w, FALSE)) } else { y$center <- matrix(99, dim(y)[1], 1)

}

indexedData <- rbind(indexedData, y)

}

indexedData <<- indexedData

}
Thank you for all of your help!

#### Dason

if(!is.finite(suppressWarnings(w <- min(z$ModelYear)))){w <- 0} #or w <- ifelse(is.finite(suppressWarnings(w <- min())), w, 0) Then again the if/else is probably more clear as to what is going on and probably runs faster too. But I had fun making it. But really I'm wondering why you need to do this in the first place? You're just getting a warning (it's not actually an error) and it returns Inf so w will contain Inf and it looks like that isn't going to mess up any of your other checks later. So if it's just the warning that is bugging you you could just do: Code: w <- suppressWarnings(min(z$ModelYear))