- Pickle and store list to file
- Numeric list to binary encoded file with array
- Text write for flat list
- To csv for multicolumn lists
- To csv with pandas
Converting a list into serial flat data for storage.
Py3: Pickling for files
import pickle original = [1,2,3,[3,4,5,[6,7,8]]] # write pickled data to file with open('piclist.pkl', 'wb') as f: pickle.dump(original, f) # reload pickled data from file with open('piclist.pkl', 'rb') as f: loaded = pickle.load(f) print(original, id(original)) print(loaded, id(loaded)) #= [1, 2, 3, [3, 4, 5, [6, 7, 8]]] 43392328 # [1, 2, 3, [3, 4, 5, [6, 7, 8]]] 43393864 # data and structure is restored # loaded variable is a deep copy
Notes: Serializing or flattening
In order to store data within a list, it needs to be flattened into a serial format. The serialized data is written out in a binary format along with the structure to recreate the original list. The file format is not secure, and can be tampered with. So it cannot be used as a database. However a pickled file can be used as temporary storage for very large lists. Shelve is like pickle, but uses a dictionary format for variables. Marshal also can be used to store lists, only for simple structures. For temporary storage, pickle is widely supported and backward compatible. reduce and reduce_ex methods provide support in pickling.
Homogeneous numeric data written to file using arrays.
Py3: Binary read/write using arrays
import array original = [1, 2, 3, 4, 5] # list to array: signed short arr_org = array.array('h', original) # save array with open('numbers.byt', 'wb') as f: arr_org.tofile(f) # load array, 3 elements arr_rel = array.array('h') with open('numbers.byt', 'rb') as f: arr_rel.fromfile(f, 3) # array to list reloaded = arr_rel.tolist() print(id(original), original) #= 4870536 [1, 2, 3, 4, 5] print(id(reloaded), reloaded) #= 4871048 [1, 2, 3]
Notes: Convert to arrays for read write
The arrays module provides a handy way to write homogeneous numeric data to file as binary byte encoded data. Arrays only handle numbers and that also the numbers need to be integers or floats. No mix and matches are allowed. The uniform set of array numbers are written as bytes to a binary file, and read back into an array with same type format. The module also provides a quick way to convert back and forth to lists.
Writing data to comma separated text file.
Py3: Basic csv
separator = ',' original = [1, 2.2, '3', 'abc'] # save data as text with open('linear.csv', 'wt') as f: f.write(separator.join([str(i) for i in original])) # load data as text with open('linear.csv', 'rt') as f: reloaded = f.read().split(separator) print(original) #= [1, 2.2, '2', 'abc'] print(reloaded) #= ['1', '2.2' '3', 'abc']
Notes: Single column of mixed data
For a linear list of numbers and strings, the data is already serialized. Nested structures will prevent the data from being written out correctly. The separator can be chosen not to interfere with data. The data elements are converted to text, and joined with the separator to form a long string which is stored as text. The reloaded data is also in the text format, and can be split using the separator. It is to be noted that the reloaded data is in string format. It can be parsed into the mixed format, with added checks for each element.
Multiple columns stored side by side, without headers.
Py3: Two column save to csv
sep = ',' col1 = [1, 2, 3, 4] col2 = [1.1, 2.2, 3.3, 4.4] # pair data columns and write out each text row # need to add new line after each line with open('test.csv', 'wt') as f: for row in zip(col1, col2): f.write(sep.join(map(str,row))) f.write('\n') # read lines, strip off newline with open('test.csv', 'rt') as f: reloaded = [row.strip() for row in f] # alternate multiline write and read # ---------------------------------- # lines = [sep.join(map(str,row))+'\n' for row in zip(col1, col2)] # with open('test.csv', 'wt') as f: # f.writelines(lines) # # with open('test.csv', 'rt') as f: # reloaded = [line.strip() for line in f.readlines()] # reloaded data is a list of strings print(reloaded) #= ['1,1.1', '2,2.2', '3,3.3', '4,4.4'] # split each item into component sequences r_col1, r_col2 = list(zip(*[item.split(sep) for item in reloaded])) print(r_col1) #= ('1', '2', '3', '4') print(r_col2) #= ('1.1', '2.2', '3.3', '4.4') # tuple of text, can be parsed as int or float into list
Notes: Multiple column csv
Multiple columns need to be first paired with each other to create rows of data. The rows are joined using data separator to form strings, which are written out as text lines to file with a newline character after each line. The data read back is a list of text, which needs to be segmented into individual lists for each column.
Pandas efficient and simple csv read and write.
Py3: Pandas dataframe
import pandas as pd col1 = [1, 2, 3, 4] col2 = [1.1, 2.2, 3.3, 4.4] # convert to pandas dataframe and write to csv df = pd.DataFrame(zip(col1, col2), columns=['col1','col2']) df.to_csv('pandatext.csv', sep=',', index=False, header=True) # read from csv df_r = pd.read_csv('pandatext.csv') # convert columns r_col1 = df_r['col1'].values.tolist() r_col2 = df_r['col2'].values.tolist() print(r_col1) #= [1, 2, 3, 4] print(r_col2) #= [1.1, 2.2, 3.3, 4.4]
Notes: Handling data formating with Pandas read_csv
Pandas module has a built-in csv writer and reader that can be fully customized for header, separator and indexing. Pandas also guesses the best format for each column. So writing and reading data is simply to generate a dataframe object with multiple columns of data and generating a text file. The generated file can be easily read with Pandas automatically determining the data types, header. The individual columns can be converted back to list columns, the whole process being simple and efficient.