Latest News

Wednesday, January 16, 2019

Remove Carriage Return from CSV Using Python



To remove carriage return from CSV using Pandas follow below steps.


import re
import io
import pandas as pd
import numpy as np

write_header = True
with open('c:/abc.csv', 'r' ,encoding='iso-8859-1') as src:
    data = src.read()
    df = pd.read_csv(io.StringIO(re.sub('"\s*\n','"',data)), chunksize=1000)   
    for chunk in df:
        chunk = chunk.replace('(?!(([^"]*"){2})*[^"]*$),', '', regex=True)
        clean_chunk = pd.DataFrame(chunk.replace({r'\r\n': ''}, regex=True))
        for col in clean_chunk.columns:
            if clean_chunk[col].dtype == np.object_:
                clean_chunk[col] = clean_chunk[col].str.replace('\n','')
        clean_chunk.to_csv('c:/clean_abc.csv', sep = ",", index=False, mode='a', encoding='iso-8859-1',quotechar='"',
                      quoting=csv.QUOTE_NONNUMERIC,header=write_header )
        write_header = False # Header is marked as False, so that for all chunks we should have just one header in csv
            
  • Google+
  • Pinterest
« PREV
NEXT »

1 comment

  1. Additionally, we choose that the two first bits (at the correct side of the typical composition of a double number) symbolize the condition of the entryway Down and the two after bits (setting off to one side of the standard composition of a parallel number) speak to the one of the entryway Right. So here are the equivalences of some decimal numbers: ExcelR Data Science Courses

    ReplyDelete