I’ve been writing some import scripts for a system at work that will take data in either CSV, XLS, XML or a database (via Zend_Db), then store that data in a common format for us to manipulate and use for generating charts and tables.
I quickly ran in to a problem using PHP, the memory limit. If I import an excel file or database with over 25,000 rows of data I soon hit my memory limit (which on my dev box is set at 256MB!).
At first I looked for problems in my classes, then for memory leaks related to various php bugs. In the end, although I’d managed to cut memory usage down by a quarter, the app still uses way too much memory.
I decided to write a script to test how much memory PHP needed just to store this data in an array, not using my class based data sets. I used the following script:
$data = array();
for($i = 0; $i < 10000; $i++){
$data[] = array(‘one’, ‘two’, ‘three’, ‘four’, ‘five’, ‘six’);
}
The result: 19MB!
If I compare the same sort of data structure in python:
data = []
count = 0while (count < 10000):
data.append(['one', 'two', 'three', 'four', 'five', 'six'])
count = count + 1
The result: 1.39MB
The difference is huge. Unfortunately I think I’m going to have to use a different programming language to handle this part of the project, I don’t think PHP is up to the task. I am a huge fan of PHP, and use it for most things, but despite its need to remain flexible I can’t see why it needs so much memory!
I also produced a graph, I thought it would be pretty:

What version of php did you use? Versions below php 5.3 don’t handle memory all that great, just look at this graph.
Yeah I spotted that graph, I was hoping that was the problem, but all tests were performed on 5.3.2.
Stranger still, I found ubuntu servers running on their package maintained versions seemed to use a lot more memory than php compiled manually!?
IMO this test is not valid. PHP stores every single string you inserted into the array, while Python has a little magic in this case: Pythong stores every string constants in memory only once, and then just uses pointers. I guess your Excel sheets are not repeating the exact same strings 25 thousand times. My results are: Python 2.6.6 used 6.4 MBs of memory to store different strings (with your script it used 3.5 MB on a 64 bit machine), the equivalent PHP script used 22 MBs. It’s still much, but the I think the comparison is better this way.
My test scripts:
test.py
1 data = [];2 count = 0;
3 while (count < 10000):
4 data.append([
5 'one%d' % count,
6 'two%d' % count,
7 'three%d' % count,
8 'four%d' % count,
9 'five%d' % count,
10 'six%d' % count
11 ])
12 count += 1
test.php
1 <?php2
3 $data = array();
4 for ($i = 0; $i
Btw. regardless of the programming language, I don’t like big files kept in memory. I’d recommend using some smartly implemented iterators to aggregate the data in the file rather than storing the entire stuff in the memory.
I agree, iterating over the data in a file would be perfect. Unfortunately the classes we’re using to import the data from Excel files (which is the case causing memory limits to be hit) is not coded in this way!
Save rewriting that class, I’m stuck with the entire files contents in memory temporarily.
I wasn’t aware of the string constants in memory, that does make my tests “a little” unfair!
Sorry, the PHP example has been destroyed by the engine. I hope this time it will work:
1
OK, I give up. Imagine the code.
[...] shared a blogpost on Twitter yesterday titled PHP vs Python array memory allocation. It states that storing ten thousands of arrays of strings in an array costs only 1.39 MB of memory [...]