Large datasets from txt files won't load completely, tried the suggested solutions!

Dear all,

I'm writing my thesis and use Stata 12SE to analyze news messages coded to a level of sentiment. However, I'm experiencing difficulties loading a whole dataset into Stata and can't figure out why. I've tried the following:

The first dataset is the "2003" file. I've checked using LTF viewer (similar to notepad, but can handle more lines) that the txt file contains 1,049,323 lines/observations (896MB). However, when I load it into Stata (using "Import>Text data created by spreadsheet"), I see at the data properties window: only 527,920 observations and Size 1,079.43M with Memory 1,216M. So that's only 50.3% of the original number of obs. Furthermore, the data isn't cut-off halfway through the year.

Secondly, I tried loading a different file, "2005". LTF viewer shows 1,439,408 lines. The file is 1.25GB. When I load this file into Stata, I get 778,087 lines, so 54.1% of the original.
Data properties show further: size 1,635.46M and Memory 1,856M.
So similar, but not equal results as the first dataset.

Note:both files have 82 variables.

I tried both files on the following three computers using Stata 12 SE (32-bit version for the first two, and 64-bit version for the third)
1 University, servercomputer: Windows 7 32-bit, 4GB RAM (3,17 available)
2 My own computer: Windows 7 32-bit, 2GB RAM (1,87 available)
3 Friend's computer: MacBook Pro dual-booted into Windows 7 64-bit, 4GB RAM

So to recap, I've tried in any way to get my txt files loaded into Stata. However, I only get about half of the observations, while the properties window in stata shows a larger size than the original file (but that doesn't seem to be odd to me)

Does anyone have encountered similar problems? And does anybody know the solution to this?

Many thanks,

Maurits Munninghoff
You can try to increase the default RAM amount allocated to stata to 2gb with the following command

set memory 2g
Dear cassoulet,

That's not it; stata 12 automatically assigns ram as much is needed restrained by hardware availability. However, I found a way to work around it. If anyone is interested, visit the forum of with the thread name the same as this one.

The bottleneck was unreadable or misinterpreted characters in the data set. Strange though that stata didn't give a warning message that observations were omitted.

Kind regards