splitting multiple values in one variable to several variables

#1
Hi,

I use several online measurement instruments for my research and for some questions the respondent can select several options. When exporting the data all the selected options are put into one variable. So I end up with variables which contain more than one value.
I wonder whether it is possible in SPSS to split these values into several variables so I have one value per variable for analysis. I now do this by copying colums of data to excel and then transform data to text and then copying it back to excel, but this takes a lot of time.

I'd be very gratefull for any tips!

Isabel email: imarrozos@trajectum.info
 
#2
This is fairly trivial to accomplish if you have a passing familiarity with SPSS syntax. The general approach is to treat the value as a string variable in order to parse it into separate numerical variables. Have a go at adapting this example to your needs. If you have trouble, post back with several examples of your data now and how you want it to appear and I'll help you further.
 
#3
@helicon,

Thanks so much for your reply but I cannot work it out from the syntax example you have shown me. I would like to attach an example of the data but can''t upload an spss file? How can I get an SPSS file example to you??
 
#4
Just something like this:

Code:
Now       ->    After
oldvar            v1      v2     v3     v4
------            ---     ---    ---    ---
1234              1        2      3       4
5423              5        4      2       3
952               9        5      2
etc.
If the desired values are always of equal length (i.e. always single digit or two digits etc), it's easy. If they're delimited by some character, it's easy. If they have varying lengths it can be a bit more tricky. I just need to see enough of a sample to understand if the logic will hold across all your cases.
 
#5
I've added an example in excell. Up till now I've split the variable in excel before reading it in to spss but I would rather do it in SPSS using a syntx as this would save me such a lot of work.

So for example variable C254 or C256 I would like to split the values delimited by ; into separate variables.

Again thanks so much for your time and help.
 
#6
OK, first of all the syntax was sourced from here so all credit goes to the original author.

As you seem to have many variables where this will need to be applied, it is best to use a macro so that it can be called once for each applicable variable rather than having to copy and paste the syntax and swap out the variable names. The following will split the variable into a predefined number of values (choose the maximum number of possible answers, currently set at 5) and write the new variables at end of the list. C254 will become C2541, C2542, C2543, C2544, C2545.

Example data:

Code:
data list list / c254 (a30) c256 (a30).
begin data.
4.0000;3.0000;8.0000;9.0000  5.0000;10.0000
1.0000;2.0000  9.0000;8.0000;7.0000
end data.
Macro:

Code:
DEFINE !parse (var=!TOKENS(1) /nbval=!TOKENS(1))
COMPUTE !var=CONCAT(RTRIM(!var),';').
STRING #str(A8).
VECTOR !var (!nbval F8.0).
COMPUTE #beg=1.
LOOP #cnt=1 TO !nbval.
+COMPUTE #str=SUBSTR(!var,#beg).
+COMPUTE #end=INDEX(#str,';')-1.
+DO IF #end=-1.
+ BREAK.
+END IF.
+COMPUTE !var(#cnt)=NUMBER(SUBSTR(#str,1,#end),F8.0).
+COMPUTE #beg=#beg+#end+1.
END LOOP IF #end=-1.
EXECUTE.
!ENDDEFINE.

* Call the macro.
!parse var=c254 nbval=5.
!parse var=c256 nbval=5.
You'll need to append this '!parse var=VAR nbval=5.' to the file for each variable you want to split, filling in the appropriate variable name and number to split where red. You'll also get errors when cases are empty, but these can be ignored.
 
#7
Hi all.

I have almost the same problem but with a string of words separated by empy spaces.
Can you help me with an update of the spss macro to this case.
I'm attaching a file with some example rows.

Thank you very much!!!
 
#8
Finally I have created an adaptation of the SPSS Macro to my problem, the code is

Code:
DEFINE !parse2 (var=!TOKENS(1) /nbval=!TOKENS(1))
COMPUTE !var=CONCAT(RTRIM(!var),' ').
STRING #str(A30).
VECTOR !var (!nbval A30).
COMPUTE #beg=1.
LOOP #cnt=1 TO !nbval.
+COMPUTE #str=SUBSTR(!var,#beg).
+COMPUTE #end=INDEX(#str,' ')-1.
+DO IF #end=-1.
+BREAK.
+END IF.
+COMPUTE !var(#cnt)=SUBSTR(#str,1,#end).
+COMPUTE #beg=#beg+#end+1.
END LOOP IF #end=-1.
EXECUTE.
!ENDDEFINE.
same way to call the Macro.

I hope this could help anyone with similar problem.
I include a file with an example of macro's work

Best Regards.
 
#9
Thank you guys. These macros work perfect. However, I was wondering if the macro can be updated to split the variable into several variables and named them based on their content.

For instance, this is the variable V_1 to be parsed (data separated by //):

V_1: CURRENT MEDICATION: ADVIL // PAST MEDICATION: LYRICA // DATE LAST WORK: 01-14-83 // RETRAINING: SECRETARY

I would like to obtain the new following variables:

V_CURRENT_MEDICATION: ADVIL
V_PAST_MEDICATION: LYRICA
v_DATE_LAST_WORK: 01-14-83
V_RETRAINING: SECRETARY

Thank you so much.
 
#10
The straightforward thing to do would be to run the macro as is and just retain the parts of the strings you're interested in then rename the variables. Append the following syntax beneath the macro:

Code:
do repeat #x = V_11 to V_14.
compute #y = char.index(#x, ":").
compute #z = char.length(#x).
compute #x = substr(#x, #y+2, #z-#y).
end repeat.
compute V_DATE_LAST_WORK = number(V_13, adate8).
formats V_DATE_LAST_WORK (adate8).
exe. 

rename variables (V_11 = V_CURRENT_MEDICATION) (V_12 = V_PAST_MEDICATION) (V_14 = V_RETRAINING).
delete variables V_13.
 
Last edited:
#11
Hi, friends,

The posts above were super helpful and I've tried playing around with it. Still, I'm having great difficulty manipulating the codes to my scenario, and would appreciate your expertise.

Essentially, this is what I am trying to do: Split a string into new variables based on the position of the information within a string. Example:

Code:
oldvar        -->      var1    var2    var3  var4
661234                   6       6      1     234
892123                   8       9      2     123
413433                   4       1      3     433
458739                   4       5      8     739

I know it's a trivial concept but I've already spent 3 days searching the web for a solution.

Cheers,
Rick
 
#12
Hey guys, I think I've found a solution to my previous question:

COMPUTE var1=SUBSTR(oldvar,1,1).
COMPUTE var2=SUBSTR(oldvar,2,1).
COMPUTE var3=SUBSTR(oldvar,3,1).
COMPUTE var4=SUBSTR(oldvar,4,3).
Execute.
 
#13
@ Helicon:

Hi! I hope you are well!

I am not very familiar with SPSS syntax at all, but in my dataset I have a a string variable that I need to parse it into separate numerical variables. The desired values are divided by semicolumns. Ex: 493.90;521.00;V15.85 or 293.83;296.22;724.5;V15.85;V70.0. Desired values are not always of the same lenght and are not always the same in number within the text unparsed variable. I tied to adapt your syntax from the previous examples but it did not work. The text variable name is @_x000D_Diagnoses. Any chance you could help me?

Many thanks!
 
#14
Here you go:

Note, this assumes that the number of variables to be parsed is no more than 10. If it is more then change the 10 on the third line to whatever the maximum number of variables is.

Code:
string #diagnoses (a250).
compute #diagnoses=concat(rtrim(@_x000D_Diagnoses), ";").
vector @_x000D_Diagnoses(10, a20).
compute #i = 1.
loop if char.index(#diagnoses, ";") gt 0.
compute #ind=char.index(#diagnoses, ";").
compute @_x000D_Diagnoses(#i)=substr(#diagnoses,1, #ind-1).
compute #diagnoses=substr(#diagnoses, #ind+1).
compute #i = #i + 1.
end loop.
exe.
 
#15
Hi all, I'm late to this ballgame, but I have found these posts to be very helpful. The issue I am running into is that after calling the macro my data is being truncated by one var. i.e. with the 2 and 15 in problem not outputting and the 10 and 3 in race not outputting.

----------

DATA LIST LIST /problem(A12) race(A12).
BEGIN DATA
'1 2' '4 5 6 9 10'
'10 11 3 15' '5 2 3'
END DATA.
LIST.


* Define a macro to do the job.
DEFINE !parse (var=!TOKENS(1) /nbval=!TOKENS(1))
COMPUTE !var=CONCAT(RTRIM(!var),' ').
STRING #str(A8).
VECTOR !var (!nbval F8.0).
COMPUTE #beg=1.
LOOP #cnt=1 TO !nbval.
+COMPUTE #str=SUBSTR(!var,#beg).
+COMPUTE #end=INDEX(#str,' ')-1.
+DO IF #end=-1.
+ BREAK.
+END IF.
+COMPUTE !var(#cnt)=NUMBER(SUBSTR(#str,1,#end),F8.0).
+COMPUTE #beg=#beg+#end+1.
END LOOP IF #end=-1.
EXECUTE.
!ENDDEFINE.

* Now call the macro.
!parse var=problem nbval=5.
!parse var=race nbval=5.
 
#16
I ran it with your sample data and didn't encounter the same issue. Here is the output of list:

Code:
problem1 problem2 problem3 problem4 problem5    race1    race2    race3    race4    race5 
 
       1        2        .        .        .        4        5        6        9       10 
      10       11        3       15        .        5        2        3        .        .
Perhaps try again, or provide a snippet of the actual data where you're encountering the problem.
 
#17
This is fairly trivial to accomplish if you have a passing familiarity with SPSS syntax. The general approach is to treat the value as a string variable in order to parse it into separate numerical variables. Have a go at adapting this example to your needs. If you have trouble, post back with several examples of your data now and how you want it to appear and I'll help you further.
Hello! Could you help me please to create synthax for splitting string variable 'cause1' containing 4-symbol code - the first are letters from V to Y (g.e. "V490") into two separate variables. The first one corresponding to the first letter but coded as: 1 "V"; 2 "W" 3 "X" 4 "Y" to make it numeric. The second also numeric containing code of three numbers.
Thank you very much in advance!
 
#18
I am running into somewhat of a more complicated issue. I have four large data sets (approximately 700 respondents in each) and some questions had responses where multiple answers could be selected. This is across several variables in the files however, I will use only one here as an example.

One variable in my data set (maspire01) has 14 responses that participants could have selected. The data was saved as an array of numbers separated by a *. For example, if all 14 answers were selected the data appear in that column as 1*2*3*4*5*6*7*8*9*10*11*12*13*14. My intention is to use the macro code provided in this documentation to separate the values into 14 variables. Where if participants selected 1 then, V1 will only be populated with the value 1 and would be blank for participants who did not select 1. Same for the other 13 variables that were computed.

I tried to alter the macro code (the code I altered is below) and apply it to my data set and the 14 variables create fine and I do not receive any errors but, all of the computed variables have missing values.


DEFINE !parse (var=!TOKENS(1) /nbval=!TOKENS(1))
COMPUTE !var=CONCAT(RTRIM(!var),'*').
STRING #str(A100).
VECTOR !var (!nbval F8.0).
COMPUTE #beg=1.
LOOP #cnt=1 TO !nbval.
+COMPUTE #str=SUBSTR(!var,#beg).
+COMPUTE #end=INDEX(#str,';')-1.
+DO IF #end=-1.
+ BREAK.
+END IF.
+COMPUTE !var(#cnt)=NUMBER(SUBSTR(#str,1,#end),F8.0).
+COMPUTE #beg=#beg+#end+1.
END LOOP IF #end=-1.
EXECUTE.
!ENDDEFINE.

* Call the macro.
!parse var=maspire01 nbval=14.



Any help here would be greatly appreciated!!!