[Python] - dealing with large text files

#1
Hi,
I have a large file (2 GB) with 4 columns that I want to read and extract some info from. The data looks like this:

CHR START END A
1 10583 10583 0.14
1 10611 10611 0.02
1 13302 13302 0.11

I also have another file from where I have extracted a string to be compared with col1:col2:col3. If there is a match extract col4 "A" The file is to large and my computer does not fancy that. Could anyone help me?
 

Dason

Ambassador to the humans
#2
Re: Python - dealing with large text files

Can you provide some sample input (you already have this), what you're comparing it to and what you want to get out of it as a result? Thanks.
 
#3
Re: Python - dealing with large text files

I have a few thoughts, though the info Dason requested would be helpful:

You'll want to read the large file line-by-line since you don't want to store all 2Gb in memory. If you use a for loop on a file, Python will run through it one line at a time:

Code:
file = open( 'filename', 'r' )
for line in file:
    # Do something
The csv module may be helpful for parsing, or you can just do line.split('separator') to get a list of your column values.

Once you've read a line, you'll want to check it right then for that string you found:

Code:
your_string = 'whatever the string is'
for line in file:
    values = line.split(' ')
    if your_string in values: # do something with values[3]
Probably not very efficient, but it will work!
 

Dason

Ambassador to the humans
#4
Re: Python - dealing with large text files

Yeah I wanted to know because like you said it's not too bad but it depends on what exactly they're looking for. Another alternative would be a short awk script.