# How to check if vector is close to space spanned by 2 basis vectors (lin. regression)

#### Rubber

##### New Member
Here is the situation:
I have measured gene expression data for cells that have been subjected to three different stimuli. This means I have 3 long vectors with numbers that describe how active each gene is in each sample. To clarify: say we are looking at sample one, I have a number for each gene that tells me how active it is in this particular sample. I have the same thing for all my stimuli.

To describe what I want to do in other words: I want to check if possibly combined stimulus is close to the space spanned by the other 2 stimuli, and if so what the relative contribution of each of these 2 stimuli is.

To model the contributions I first want to assume that one of my stimuli is a linear combination of the other two. I use minimization of the sums of squares, so I have to solve the following equation for x (from what I remember from linear algebra):
A.x=b
where b is the vector of the stimulus I expect to be a combination of the other two. A is a matrix that contains the two vectors that should combine to form b. 'x' contains the coefficients that tell me how important each of the two vectors is in constructing the combined vector (the stimulus we expect to be a combination of the other two).

So above is described what I did, now here are my questions.

(1) I have done some simulations where I know 'x' before hand, and tried to find it back. I noticed that the larger the noise, the more the coefficients start to go to zero. Also, the larger coefficient drops much more than the smaller one (even relative to its own size). I intuitively understand this a bit, as both coefficients should get more similar as the noise increases (since we are fitting more more two noise, and both stimuli should be equally good at this). But is there any theory explaining this?

(2) how can I best evaluate if one stimulus is indeed a linear combination of the other two. I was thinking of doing a similar analysis with unrelated stimuli, and comparing the residuals to the residuals of my real analysis.

(2) is what I am doing (using linear regression/least-squares) the best way to find the relative contributions? We are not sure it will be a linear relationship (e.g. one stimulus might even amplify the other).