Parsing JSON column from a CSV file

#1
I am trying to parse out some JSON data that is currently in a single column of a .csv file and make that output another .csv file. I have been trying to follow along with what the other threads have said but I'm not having any luck. So, my problem is the following: I have a .csv file that has lots of variables, a few of which are in JSON format. My question is, how do I parse one of these variables out to make a .csv file of just that variable? My data looks like this (this is just one line of it, I have about 150,000 rows of this):

Code:
foo$JSON_var[1] <- 
[{u'position': 0, u'name': u'player1', u'var_c': 400, u'var_r': 0, u'var_b': 0, u'var_v': 10, u'var_p': 0, u'var_pp': 0}, 
{u'position': 1, u'name': u'player2', u'var_c': 280, u'var_r': 0, u'var_b': 0, u'var_v': 20, u'var_p': 0, u'var_ppf': 0}, 
{u'position': 2, u'name': u'player3', u'var_c': 196, u'var_r': 0, u'var_b': 0, u'var_v': 30, u'var_p': 0, u'var_pp': 0}, 
{u'position': 3, u'name': u'player4', u'var_c': 0, u'var_r': 0, u'var_b': 0, u'var_v': 40, u'var_p': 0, u'var_pp': 0}]
Then the next cell (foo$JSON_var[2]) has the same format but with different values, etc. I'm looking to break this up into a .csv file that has variable names: position, name, var_c, var_r, var_b, var_v, var_p, and var_pp. (again, I need this for the whole data set, not just the single cell I gave as the example).

I tried using

Code:
json_data <- fromJSON(foo$JSON_var)
but got error "STRING_ELT() can only be applied to a 'character vector', not a 'integer'"

I also tried (based on what I have read on other threads)

Code:
json_data <- fromJSON(paste(readLines(foo$JSON_var), collapse=""))
But I got "Error in readLines(foo$JSON_var) :
'con' is not a connection"

Any help with trying to figure out how to parse out these data into a usable form would be highly appreciated. Thanks for looking.

**Edited to fix the assignment tag in the first code snippet**
 
Last edited:

bryangoodrich

Probably A Mammal
#2
Put your JSON into a file and read it in that way. Otherwise, you have to put it into a textConnection object. Your code right now doesn't even make sense because "==" is an equality check, not an assignment. Furthermore, you don't have any quotes around the JSON which is supposed to be a text string.

Code:
library(RJSONIO)
con <- textConnection("[{u'position': 0, u'name': u'player1', u'var_c': 400, u'var_r': 0, u'var_b': 0, u'var_v': 10, u'var_p': 0, u'var_pp': 0}, 
{u'position': 1, u'name': u'player2', u'var_c': 280, u'var_r': 0, u'var_b': 0, u'var_v': 20, u'var_p': 0, u'var_ppf': 0}, 
{u'position': 2, u'name': u'player3', u'var_c': 196, u'var_r': 0, u'var_b': 0, u'var_v': 30, u'var_p': 0, u'var_pp': 0}, 
{u'position': 3, u'name': u'player4', u'var_c': 0, u'var_r': 0, u'var_b': 0, u'var_v': 40, u'var_p': 0, u'var_pp': 0}]")
x <- fromJSON(con)  # List representation of that JSON

# or directly as raw text; I would avoid this, personally. 

x <- fromJSON("[{u'position': 0, u'name': u'player1', u'var_c': 400, u'var_r': 0, u'var_b': 0, u'var_v': 10, u'var_p': 0, u'var_pp': 0}, 
{u'position': 1, u'name': u'player2', u'var_c': 280, u'var_r': 0, u'var_b': 0, u'var_v': 20, u'var_p': 0, u'var_ppf': 0}, 
{u'position': 2, u'name': u'player3', u'var_c': 196, u'var_r': 0, u'var_b': 0, u'var_v': 30, u'var_p': 0, u'var_pp': 0}, 
{u'position': 3, u'name': u'player4', u'var_c': 0, u'var_r': 0, u'var_b': 0, u'var_v': 40, u'var_p': 0, u'var_pp': 0}]")

# or just store it directly as a string object; this really isn't much different than how JS parses JSON from a variable

str <- "[{u'position': 0, u'name': u'player1', u'var_c': 400, u'var_r': 0, u'var_b': 0, u'var_v': 10, u'var_p': 0, u'var_pp': 0}, 
{u'position': 1, u'name': u'player2', u'var_c': 280, u'var_r': 0, u'var_b': 0, u'var_v': 20, u'var_p': 0, u'var_ppf': 0}, 
{u'position': 2, u'name': u'player3', u'var_c': 196, u'var_r': 0, u'var_b': 0, u'var_v': 30, u'var_p': 0, u'var_pp': 0}, 
{u'position': 3, u'name': u'player4', u'var_c': 0, u'var_r': 0, u'var_b': 0, u'var_v': 40, u'var_p': 0, u'var_pp': 0}]"
x <- fromJSON(str)
 
#4
Sorry, I haven't input the first code snippet, I was just showing what the data looks like in each cell (thus, I'm not actually using the "==" part, bad form on my part :-/ I edited the post).

What do you mean put the JSON into a file and read it in that way? Does this mean make a new object:

Code:
JSON_data <- foo$JSON_var
Then perform the fromJSON function?

Where do you mean to add quotes in the file? Outside of the [] ? or for each object in the {} ?

Sorry for all of the questions, this is my first time dealing with JSON format.
 

bryangoodrich

Probably A Mammal
#5
To your point about making this tabular, since it ultimately is tabular, you could do something like

Code:
x <- fromJSON(str, simplify = TRUE)
x
# [[1]]
# 'position     'name    'var_c    'var_r    'var_b    'var_v    'var_p   'var_pp 
#         0        NA       400         0         0        10         0         0 
# 
# [[2]]
# 'position     'name    'var_c    'var_r    'var_b    'var_v    'var_p  'var_ppf 
#         1        NA       280         0         0        20         0         0 
# 
# [[3]]
# 'position     'name    'var_c    'var_r    'var_b    'var_v    'var_p   'var_pp 
#         2        NA       196         0         0        30         0         0 
# 
# [[4]]
# 'position     'name    'var_c    'var_r    'var_b    'var_v    'var_p   'var_pp 
#         3        NA         0         0         0        40         0         0 

# Looks pretty tabular!
x<- do.call(rbind, x)
x
#      'position 'name 'var_c 'var_r 'var_b 'var_v 'var_p 'var_pp
# [1,]         0    NA    400      0      0     10      0       0
# [2,]         1    NA    280      0      0     20      0       0
# [3,]         2    NA    196      0      0     30      0       0
# [4,]         3    NA      0      0      0     40      0       0

# Ugly columns, though
colnames(x) <- gsub("\'", "", colnames(x))
x
#      position name var_c var_r var_b var_v var_p var_pp
# [1,]        0   NA   400     0     0    10     0      0
# [2,]        1   NA   280     0     0    20     0      0
# [3,]        2   NA   196     0     0    30     0      0
# [4,]        3   NA     0     0     0    40     0      0

write.csv(x, file = "whatever.csv", row.names = FALSE)
 
#6
Thanks for the quick replies! I am having some trouble with the codes that you gave, though. When I run

Code:
> str <- "[{u'position': 0, u'name': u'player1', u'var_c': 400, u'var_r': 0, u'var_b': 0, u'var_v': 10, u'var_p': 0, u'var_pp': 0}, 
 {u'position': 1, u'name': u'player2', u'var_c': 280, u'var_r': 0, u'var_b': 0, u'var_v': 20, u'var_p': 0, u'var_ppf': 0}, 
 {u'position': 2, u'name': u'player3', u'var_c': 196, u'var_r': 0, u'var_b': 0, u'var_v': 30, u'var_p': 0, u'var_pp': 0}, 
 {u'position': 3, u'name': u'player4', u'var_c': 0, u'var_r': 0, u'var_b': 0, u'var_v': 40, u'var_p': 0, u'var_pp': 0}]"
> x<-fromJSON(str)
I get "Error in fromJSON(str) : unexpected character 'u'"

And when I run

Code:
con <- textConnection("[{u'position': 0, u'name': u'player1', u'var_c': 400, u'var_r': 0, u'var_b': 0, u'var_v': 10, u'var_p': 0, u'var_pp': 0}, 
 {u'position': 1, u'name': u'player2', u'var_c': 280, u'var_r': 0, u'var_b': 0, u'var_v': 20, u'var_p': 0, u'var_ppf': 0}, 
 {u'position': 2, u'name': u'player3', u'var_c': 196, u'var_r': 0, u'var_b': 0, u'var_v': 30, u'var_p': 0, u'var_pp': 0}, 
 {u'position': 3, u'name': u'player4', u'var_c': 0, u'var_r': 0, u'var_b': 0, u'var_v': 40, u'var_p': 0, u'var_pp': 0}]")
> x<-fromJSON(con)
I get "Error in fromJSON(con) :
STRING_ELT() can only be applied to a 'character vector', not a 'integer'"

Any idea as to what I'm doing wrong here?
 

bryangoodrich

Probably A Mammal
#7
Are you using the RJSONIO library like I indicated? I don't know why the 'u' is there in the first place. I'm pretty sure it's not part of the JSON standard. I'd remove them.

Code:
str <- gsub("u\'", "\'", str)  # Regular expression pattern match and substitution. Replace u' with ' anywhere in str.
I should probably also indicate which version of R and the library I am using, but I'm lazy atm.