reshaping dataframe to matrix

gianmarco

TS Contributor
#1
Hello,
I am admittedly very dumb when it comes at reshaping data in R. So, I would appreciate to have some suggestion on what follows.

I got the appended dataset with three columns:
the first 2 contains codes for sites, the last column is the distance (in meters) between each pairs of sites. So, the first two rows indicate the distance between site 0 and site 6, and between site 0 and site 3.

Now, what I am after is to get a sort of matrix, with the sites in rows AND columns, and pairwise distances put at the intersections of each row/column.

I have made a search, and I think that cast() function in reshape2 package could do the trick, but I was unsuccessful. Maybe it is not the proper way.

Best
Gm

Code:
mydata <- structure(list(IN_FID = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 
14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L), NEAR_FID = c(6L, 3L, 
4L, 5L, 9L, 10L, 11L, 2L, 7L, 12L, 8L, 1L, 13L, 14L, 6L, 3L, 
4L, 5L, 9L, 10L, 11L, 2L, 7L, 12L, 8L, 0L, 13L, 14L, 6L, 3L, 
4L, 5L, 9L, 10L, 11L, 7L, 12L, 8L, 1L, 0L, 13L, 14L, 6L, 4L, 
5L, 9L, 10L, 11L, 2L, 7L, 12L, 8L, 1L, 0L, 13L, 14L, 6L, 3L, 
5L, 9L, 10L, 11L, 2L, 7L, 12L, 8L, 1L, 0L, 13L, 14L, 6L, 3L, 
4L, 9L, 10L, 11L, 2L, 7L, 12L, 8L, 1L, 0L, 13L, 14L, 3L, 4L, 
5L, 9L, 10L, 11L, 2L, 7L, 12L, 8L, 1L, 0L, 13L, 14L, 6L, 3L, 
4L, 5L, 9L, 10L, 11L, 2L, 12L, 8L, 1L, 0L, 13L, 14L, 6L, 3L, 
4L, 5L, 9L, 10L, 11L, 2L, 7L, 12L, 1L, 0L, 13L, 14L, 6L, 3L, 
4L, 5L, 10L, 11L, 2L, 7L, 12L, 8L, 1L, 0L, 13L, 14L, 6L, 3L, 
4L, 5L, 9L, 11L, 2L, 7L, 12L, 8L, 1L, 0L, 13L, 14L, 6L, 3L, 4L, 
5L, 9L, 10L, 2L, 7L, 12L, 8L, 1L, 0L, 13L, 14L, 6L, 3L, 4L, 5L, 
9L, 10L, 11L, 2L, 7L, 8L, 1L, 0L, 13L, 14L, 6L, 3L, 4L, 5L, 9L, 
10L, 11L, 2L, 7L, 12L, 8L, 1L, 0L, 14L, 6L, 3L, 4L, 5L, 9L, 10L, 
11L, 2L, 7L, 12L, 8L, 1L, 0L, 13L), NEAR_DIST = c(11734.64331, 
14944.7713, 10762.23323, 14402.32647, 7871.806679, 5504.769969, 
4385.852186, 3763.651046, 6333.476493, 7518.014792, 958.7684749, 
905.3988448, 16750.11162, 16826.84516, 11245.61777, 14688.45416, 
10344.94726, 14262.55135, 6966.420303, 4603.437849, 3508.44596, 
4023.44918, 6614.427812, 7775.457295, 644.5470381, 905.3988448, 
16924.79142, 17138.14258, 9440.73243, 11766.51552, 8229.740032, 
10966.77846, 9290.017717, 6944.446859, 5649.613123, 2592.932711, 
3765.151408, 4544.547088, 4023.44918, 3763.651046, 20331.67283, 
20205.44879, 4966.944291, 4892.598534, 1982.563867, 14527.34422, 
13752.75955, 13241.51768, 11766.51552, 10174.46601, 9379.428834, 
15332.91677, 14688.45416, 14944.7713, 31607.82725, 31771.24944, 
1376.323022, 4892.598534, 5480.343405, 9662.327775, 8882.270147, 
8453.85725, 8229.740032, 7588.950997, 7327.695243, 10981.1096, 
10344.94726, 10762.23323, 27114.45071, 27469.61971, 6009.793108, 
1982.563867, 5480.343405, 15044.293, 13964.43041, 13284.29667, 
10966.77846, 9058.118202, 8121.976115, 14900.13396, 14262.55135, 
14402.32647, 31151.12146, 31153.47853, 4966.944291, 1376.323022, 
6009.793108, 9699.710625, 9260.670561, 9015.054166, 9440.73243, 
8938.584611, 8702.043203, 11869.19525, 11245.61777, 11734.64331, 
27832.60457, 28295.64563, 8938.584611, 10174.46601, 7588.950997, 
9058.118202, 11229.47787, 9035.080047, 7796.387895, 2592.932711, 
1190.475728, 7136.227227, 6614.427812, 6333.476493, 22688.33828, 
22410.50216, 11869.19525, 15332.91677, 10981.1096, 14900.13396, 
7171.16916, 4878.143335, 3886.140646, 4544.547088, 7136.227227, 
8308.918259, 644.5470381, 958.7684749, 16281.2088, 16497.06125, 
9699.710625, 14527.34422, 9662.327775, 15044.293, 2427.397811, 
3711.931159, 9290.017717, 11229.47787, 12042.4547, 7171.16916, 
6966.420303, 7871.806679, 19693.40634, 20764.04629, 9260.670561, 
13752.75955, 8882.270147, 13964.43041, 2427.397811, 1302.355353, 
6944.446859, 9035.080047, 9936.320633, 4878.143335, 4603.437849, 
5504.769969, 18901.7059, 19690.54131, 9015.054166, 13241.51768, 
8453.85725, 13284.29667, 3711.931159, 1302.355353, 5649.613123, 
7796.387895, 8736.416633, 3886.140646, 3508.44596, 4385.852186, 
18859.41834, 19475.81952, 8702.043203, 9379.428834, 7327.695243, 
8121.976115, 12042.4547, 9936.320633, 8736.416633, 3765.151408, 
1190.475728, 8308.918259, 7775.457295, 7518.014792, 23857.57997, 
23546.64743, 27832.60457, 31607.82725, 27114.45071, 31151.12146, 
19693.40634, 18901.7059, 18859.41834, 20331.67283, 22688.33828, 
23857.57997, 16281.2088, 16924.79142, 16750.11162, 2653.267341, 
28295.64563, 31771.24944, 27469.61971, 31153.47853, 20764.04629, 
19690.54131, 19475.81952, 20205.44879, 22410.50216, 23546.64743, 
16497.06125, 17138.14258, 16826.84516, 2653.267341)), .Names = c("IN_FID", 
"NEAR_FID", "NEAR_DIST"), class = "data.frame", row.names = c(NA, 
-210L))
 

TheEcologist

Global Moderator
#2
Before the Wickham fanboys and girls charge in and give you something needlessly complex:

Code:
xtabs(NEAR_DIST~IN_FID+NEAR_FID, mydata)
 

gianmarco

TS Contributor
#5
Yes, TE, it's true. But the problem is the asymmetry of my R knowledge: I know how to do some 'advanced' stuff, and on the other hand I get stuck is such silly things. Besides, I do know that data reshaping is the topic that I know least.
 

trinker

ggplot2orBust
#6
A reshape2 approach (though TE's is more elegant) would be:

Code:
library(reshape2)
mydat2 <- melt(mydata, id=c("IN_FID", "NEAR_FID", "NEAR_DIST"))
cast(mydat2, IN_FID~NEAR_FID)
 

bryangoodrich

Probably A Mammal
#8
This is one of those cases were the highly underutilized xtabs is best, but reshape2 dcast handles it well

Code:
reshape2::dcast(mydata, IN_FID ~ NEAR_FID, value.var = "NEAR_DIST", fill = 0)
To piss Jake off, here's how you'd get away with it using pipes!

Code:
library(dplyr)
mydata %>% reshape2::dcast(IN_FID ~ NEAR_FID, value.var = "NEAR_DIST", fill = 0)
mydata %>% xtabs(NEAR_DIST ~ IN_FID + NEAR_FID, .)  # How to use pipes when first argument isn't the actual data object; use period.
Doing a comparison between the xtabs and reshape2 approach, it looks like the overhead in dcast makes it run 1.35x that of xtabs.