web123456

Python pandas reads the specified row of csv file_python pandas gets the specified row of csv column operation method

python pandasGet the operation method of specifying rows and columns in csv

pandas gets the specified row of csv and column

house_info = pd.read_csv('house_info.csv')

1: Operation of taking the line:

house_info.loc[3:6] is similar to python slice operation

2: Column picking operation:

house_info['price'] This is the default first line index when reading csv file

3: Take two columns

The same is true for house_info[['price',tradetypename']] to take multiple columns. Note that there is a list of lists, otherwise an error will be reported;

4: Add columns:

house_Info['adress_new']=list([......]) is somewhat similar to the dictionary operation;

5: Divide a column by its maximum value, so that a numerical range of 0 and 1 can be obtained, which is a simple normalization operation;

house_info['price']/house_info['price'].max()

6: Sorting the columns:

house_info.sorted_values('price',inplace=True,ascending=True) The inplace here indicates whether a new dataframe structure is generated when resorting. Ascending=true means ascending order, and the default is also ascending order; another thing to note is that for the default value, (Nan) will be placed at the end;

7: How to get the default value:

column_null = (column)

column_is_null_true = column[column_null]

Summarize

The above is the operation method of obtaining csv specified rows and columns of Python pandas introduced to you. I hope it will be helpful to you. If you have any questions, please leave me a message and the editor will reply to you in time. Thank you very much for your support for our website!

If you think this article is helpful to you, please reprint it. Please indicate the source, thank you!

Time: 2019-07-12

The csv file used to store data is sometimes very large, but sometimes we do not need all the data. What we need may be the first few lines. This can be achieved through the function of reading the number of lines in read_csv in pandas. For example, there is a file, and the content of the file is as follows: GreydeMac-mini:chapter06 greyzhang$ cat ,name_01,coment_01,,,, 2,name_02,coment_02,,,, 3,name_03,coment_03,,,, 4,name_04,co

Pandas library is so smart. With the dateframe format, everything is easy to do. Compared to the csv library, it is awkward to support Chinese. reader = pd.read_csv(leg2CsvReadFile, delimiter="," ,header=0, encoding = "gbk") header=None means that the original file data has no column index, so read_csv will automatically add column index unless you give the name of the column index. obj_2=pd.read_csv('f:/',header=0,na

In Python, we often read csv files, as follows import pandas as pd import numpy as np df = pd.read_csv("") data = np.array([:,:]) The data obtained in this way does not contain the first row. Generally speaking, the first row is the column label. So how to get the content of the first row? The following column_headers = list() The above method of obtaining column labels after reading csv in python pandas

According to the tutorial, I realized the reading of the first rows of data in the csv file, and I immediately thought about whether I could implement the data in the first columns. After many attempts, I finally tried a method. The reason I wanted to implement the reading of the first columns is because the csv file I had happened to have the following columns, but it has always existed. The original data is as follows: GreydeMac-mini:chapter06 greyzhang$ cat 1,name_01,coment_01,,,, 2,name_02,coment_02,,,, 3,name_03,coment_03,,,, 4,name_04

fromwindowsThe operating system reports an error by reading csv files locally data = pd.read_csv(path) Traceback (most recent call last): File "C:/Users/arron/PycharmProjects/ML/ML/", line 45, in data = pd.read_csv(path) File "C:\Users\arron\AppData\Local\Continuum\Anacon

The array is stored as a partitioned file such as CSV: The following code specifies a seed to the random number generator and generates a 3*4 NumPy array Set the value of an array element to NaN: In [26]: import numpy as np In [27]: (42) In [28]: a = (3,4) In [29]: a[2][2] = In [30]: print(a) [[ 0.49671415 -0.1382643 0.64768854 1.52302986] [

Environment: numpy, pandas, python3 inMachine LearningandDeep LearningDuring the process, for processing prediction and regression problems, sometimes variables are time, and appropriate conversion processing is required before learning and analysis is carried out. The variables about time are shown below. Pandas and numpy are used to process the time in the csv file. date (UTC) Price 01/01/2015 0:00 48.1 01/01/2015 1:00 47.33 01/01/2015 2:00 42.27 #coding:utf-8 import datetime import pandas as

When reading a csv file in python, the general operation is as follows: import pandas as pd pd.read_csv(filename) This method of reading the file is based on commas " and " as the splitter. If other separators are used, such as tab character "/t", the specified separator needs to be displayed. The following pd_read_csv(filename,'/t') But if you encounter a character containing "/t" in a field, such as the URL "/t-", the "/t" in the field will also be understood as

Data integration: Connect the data of different tables through primary keys to facilitate overall analysis of the data. Two tables:, : : pandas reads the csv file and merges the csv file: # -*- coding:utf-8 -*- import csv as csv import numpy as np # --------------- # csv reads the table data # ----

The following is to introduce the format of the data type of each column when pandas reads CSV files. The specific content is as follows: We often view it when placing bugs. Modify the data type of the pandas column data. Let me summarize it today: 1. View: The viewing methods of Numpy and Pandas are slightly different, one is dtype, and the other is dtypes print() #Output int64 print() #Output data format of all columns under Df a:int64,b:int64 2. Modify import pandas as pd import numpy a

Today I received a new task to process data on a csv file of more than 140M. There are more than 1.7 million lines in total. I tried to import the local one.MySQLThe database was searched, and the result was that the import using Navicat was stuck... It is estimated that the MySQL configuration in the XAMPP set was not very effective in performance. I tried to use R and then found that it took about 3 minutes to load the csv file, which was quite unsatisfactory. After searching through the universal Zhihu, I found a magic package under Python: Pandas (Pandas?), loading this 140M csv file in two seconds, and the subsequent classification and summary operations are also opened in seconds. It is so awesome! Record the process of this data processing: Use

There are many convenient libraries in Python that can be used for data processing, especially Numpy and Pandas, which are combined with matplot drawing dedicated modules, which are very powerful. CSV (Comma-Separated Values) format files refer to table data stored in plain text, which means that they cannot be simply processed using Excel table tools. Moreover, the amount of data processed by Excel tables is very limited, and using Pandas to process CSV files with huge data volumes is much easier. What I use is to use other ones myself.hardwareTools to obtain data, the hardware environment isLinuxBuilt on the platform, the data was directly output to the terminal after running the script.

1. My needs For such a csv table, it is necessary to splice the name of the business unit and the date and the stock code (2) For data with different purchase amounts, their purchase amounts need to be added, and each purchase amount is multiplied by the symbol of the purchase and sale sequence number indicates the purchase amount corresponding to the business name. For example: xx Company, 20190731, 1, stocks 1,4000, C20201010, xxxx The result I want here is: xx Company 2019713C20201010, 4000 2. Code (1) First of all, since the file is gbk, it is necessary to pay attention to reading. encoding (2) The date is of type int, so it needs to be converted into a string

When using pandas to process .csv files, sometimes we want to save .csv files without a header, so I went to see the document of DataFrame.to_csv. I found that I just need to add the parameter header=None (the default is True). The document is posted below: DataFrame.to_csv(path_or_buf=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=Non