Skip to content Skip to sidebar Skip to footer

Python CSV , Find Max And Print The Information

My aim is to find the max of the individual column and print out the information. But there is problem when I print some of the information. For example CSIT135, nothing was printe

Solution 1:

You haven't specified the format of the output, so instead of printing the result I written function returning dict with each key representing each column and each value containing dict representing row with max value in that column.

I don't like the file rewinding part, but it seems to be necessary, because the csv.DictReader during iteration is using the file handle it has received in the constructor and doesn't rewind it after the iteration. This might be why you only see one result with your code.

import csv

def get_maxes():
    with open("data.csv", "r") as data_file:
        data = csv.DictReader(data_file)

        # don't process first 3 colums
        columns_to_process = data.fieldnames[3:]
        column_max = {}
        for column in columns_to_process:
            data_file.seek(0) # rewind the file after iteration in line above
            data_file.readline() # skip the first line with header

            column_max[column] = max(data, key=lambda x: x[column])

    return column_max

if __name__ == '__main__':
    print(get_maxes())

Output:

{'CSIT110': {'CSIT110': '89',
             'CSIT121': '67',
             'CSIT135': '54',
             'CSIT142': '78',
             'first_name': 'Peter',
             'last_name': 'Tan',
             'student_id': 'S1012342D'},
 'CSIT121': {'CSIT110': '87',
             'CSIT121': '78',
             'CSIT135': '86',
             'CSIT142': '67',
             'first_name': 'John',
             'last_name': 'Lim',
             'student_id': 'S1014322H'},
 'CSIT135': {'CSIT110': '87',
             'CSIT121': '78',
             'CSIT135': '86',
             'CSIT142': '67',
             'first_name': 'John',
             'last_name': 'Lim',
             'student_id': 'S1014322H'},
 'CSIT142': {'CSIT110': '89',
             'CSIT121': '67',
             'CSIT135': '54',
             'CSIT142': '78',
             'first_name': 'Peter',
             'last_name': 'Tan',
             'student_id': 'S1012342D'}}

EDIT:

If you consume all the rows at once from the DictReader you don't need to rewind the file:

import csv

def get_maxes():
    with open("data.csv", 'r') as data_file:
        data = csv.DictReader(data_file)
        columns_to_process = data.fieldnames[3:] # don't process first 3 colums
        data = [row for row in data] # read all the data from DictReader and store it in the list

        column_max = {}
        for column in columns_to_process:
            column_max[column] = max(data, key=lambda x: x[column])

        return column_max

if __name__ == '__main__':
    import pprint
    pprint.pprint(get_maxes())

Solution 2:

Were you considering duplicating your code for each column and read your file again and again ? Chose DRY instead of WET

First step should be converting your file in something convenient for your problem. I decided do go with a dictionary (keys are columns), values are list of values in your column.

import csv
datas = {}
with open("data.csv") as f:   
    csvreader = csv.DictReader(f) 
    for row in csvreader:
        for key in 'CSIT110', 'CSIT121', 'CSIT135', 'CSIT142':
            datas.setdefault(key, []).append(int(row[key]))

Now I've got something to work on, I iterate over my dictionary and use max() function.

for key, value in datas.items():
    max_value = max(value)
    print('key : {}, max : {}'.format(key, max_value))

You could use pandas which is adapted for this problem.

import pandas as pd
df = pd.read_csv('data.csv')
for key in 'CSIT110', 'CSIT121', 'CSIT135', 'CSIT142':
    print('key : {}, max : {}'.format(key, df[key].max()))

Result of both codes :

key : CSIT110, max : 89
key : CSIT121, max : 78
key : CSIT135, max : 86
key : CSIT142, max : 78

You need all the informations about the line containing the maximum value ? :

import pandas as pd
df = pd.read_csv('data.csv')
for key in 'CSIT110', 'CSIT121', 'CSIT135', 'CSIT142':
    row = df.loc[df[key].idxmax()]
    print('Max of {} : '.format(key))
    print(row)

Result :

Max of CSIT110 : 
first_name        Peter
last_name           Tan
student_id    S1012342D
CSIT110              89
CSIT121              67
CSIT135              54
CSIT142              78
Name: 0, dtype: object
Max of CSIT121 : 
first_name         John
last_name           Lim
student_id    S1014322H
CSIT110              87
CSIT121              78
CSIT135              86
CSIT142              67
Name: 1, dtype: object
Max of CSIT135 : 
first_name         John
last_name           Lim
student_id    S1014322H
CSIT110              87
CSIT121              78
CSIT135              86
CSIT142              67
Name: 1, dtype: object
Max of CSIT142 : 
first_name        Peter
last_name           Tan
student_id    S1012342D
CSIT110              89
CSIT121              67
CSIT135              54
CSIT142              78

Post a Comment for "Python CSV , Find Max And Print The Information"