Methods to Convert Tab-Separated Files into a Data Frame
Method 1: Using pandas ‘read_csv()’ with ‘sep’ parameter
In this method, we will use the Pandas library to read a tab-separated file into a data frame.
Look at the following code snippet.
- We have imported the pandas library and defined the path of the tab-separated file.
- Then, we use ‘pd.read_csv()’ function to read the contents of the tab-separated file into a DataFrame and specified that the file is tab-separated using “sep =’\t'”
- The ‘
read_csv()'
function automatically detects the delimiter and parses the file accordingly.
import pandas as pd
file_path = "file.tsv"
df = pd.read_csv(file_path,sep='\t')
df.head()
Output:
0 50 5 881250949
0 0 172 5 881250949
1 0 133 1 881250949
2 196 242 3 881250949
3 186 302 3 891717742
4 22 377 1 878887116
Method 2: Using pandas ‘read_table()’ function
In the following code snippet, we have again used the pandas library in Python to read the contents of a tab-separated file named ‘file.tsv’ into a DataFrame named ‘df’. The pd.read_table()
function is employed for this task, which automatically infers the tab separator.
import pandas as pd
df = pd.read_table('file.tsv')
df.head()
Output:
0 50 5 881250949
0 0 172 5 881250949
1 0 133 1 881250949
2 196 242 3 881250949
3 186 302 3 891717742
4 22 377 1 878887116
Method 3: Using csv module
The code example, begin by importing the csv module, which provides functionality for reading and writing CSV files.
- Uses the
open()
function to open the file specified byfile_path
in read-only mode ('r'
). Utilized thewith
statement to ensure proper file closure after reading. - Creates a CSV reader object using
csv.reader
(file, delimiter=’\t’), specifing that the values in the file are tab-separated.
import csv
file_path = "file.tsv"
with open(file_path, 'r') as file:
reader = csv.reader(file, delimiter='\t')
df = pd.DataFrame(reader)
df.head()
Output:
0 1 2 3
0 0 50 5 881250949
1 0 172 5 881250949
2 0 133 1 881250949
3 196 242 3 881250949
4 186 302 3 891717742
Method 4: Use ‘numpy’ to load the data and then convert to a DataFrame
This code segment employs NumPy’s ‘genfromtxt()’ function to import tab-separated data from ‘file.tsv’ into a NumPy array, configuring the tab delimiter and data type. Following this, it converts the NumPy array into a pandas DataFrame, facilitating structured data representation for further analysis and manipulation.
import numpy as np
import pandas as pd
data = np.genfromtxt('file.tsv', delimiter='\t', dtype=None, encoding=None)
df = pd.DataFrame(data)
df.head()
Output:
0 1 2 3
0 0 50 5 881250949
1 0 172 5 881250949
2 0 133 1 881250949
3 196 242 3 881250949
4 186 302 3 891717742
How to convert tab-separated file into a dataframe using Python
In this article, we will learn how to convert a TSV file into a data frame using Python and the Pandas library.
A TSV (Tab-Separated Values) file is a plain text file where data is organized in rows and columns, with each column separated by a tab character.
- It is a type of delimiter-separated file, similar to CSV (Comma-Separated Values).
- Tab-separated files are commonly used in data manipulation and analysis, and being able to convert them into a data frame can greatly enhance our ability to work with structured data efficiently.