由于能够使用熊猫的限制(不允许),我正在尝试在两个 csv 档案之间进行左连接操作。我正在挣扎。下面是一个例子:
import csv
def read_csv(path):
file = open(path, "r")
content_list = []
for line in file.readlines():
record = line.split(",")
for item in range(len(record)):
record[item] = record[item].replace("\n","")
content_list.append(tuple(record))
return content_list
lookup_list = read_csv("lookup.csv")
data_list = read_csv("data.csv")
print("list with id and name:")
print(lookup_list)
print("list with id, age, weight:")
print(data_list)
result =list()
data_dict = {x[0]: x for x in data_list[1:]}
for left in lookup_list[1:]:
if left[0] in data_dict:
result.append(data_dict.get(left[0]) left[1:])
print("Result of merge:")
print(result)
带有 id 和 name 的串列:
[('id', 'name'), ('123', 'Robin'), ('221', 'Larry'), ('331', 'Wilson'), ('412', 'Jack') ]
包含 id、年龄、体重的串列:
[('id', 'age', 'weight'), ('123', '47', '320'), ('221', '47', '190'), ('331', '25' ', '225'), ('412', '21', '180'), ('110', '14', '150')]
合并结果:
[('123', '47', '320', '罗宾'), ('221', '47', '190', '拉里'), ('331', '25', '225', '威尔逊'), ('412', '21', '180', '杰克')]
由于 lookup_list 没有 id 110 的条目,因此它不包含在结果中。我需要将它包含在结果中,“名称”值为空。这是我挣扎的地方。
使用 pandas 容易得多,但我们的自动化工程师限制我们只能使用标准 python 发行版中包含的库/模块。
在此先感谢您的帮助。
uj5u.com热心网友回复:
该解决方案按照我的描述执行,并将串列读入字典。然后,您可以使用合并的结果撰写一个新的 CSV 档案。
import csv
from pprint import pprint
def read_csv(path):
file = open(path, "r")
contents = {}
header = []
for line in file.readlines():
record = line.strip().split(",")
if not header:
header = record
else:
contents[record[0]] = {a:b for a,b in zip(header,record)}
return contents
lookup_list = read_csv("xxx.csv")
data_list = read_csv("yyy.csv")
print("list with id and name:")
pprint(lookup_list)
print("list with id, age, weight:")
pprint(data_list)
for k,v in data_list.items():
if k not in lookup_list:
lookup_list[k] = {'name':''}
lookup_list[k].update(v)
print("Result of merge:")
pprint(lookup_list)
输出:
list with id and name:
{'123': {'id': '123', 'name': 'Robin'},
'221': {'id': '221', 'name': 'Larry'},
'331': {'id': '331', 'name': 'Wilson'},
'412': {'id': '412', 'name': 'Jack'}}
list with id, age, weight:
{'110': {'age': '14', 'id': '110', 'weight': '150'},
'123': {'age': '47', 'id': '123', 'weight': '320'},
'221': {'age': '47', 'id': '221', 'weight': '190'},
'331': {'age': '25', 'id': '331', 'weight': '255'},
'412': {'age': '21', 'id': '412', 'weight': '180'}}
Result of merge:
{'110': {'age': '14', 'id': '110', 'name': '', 'weight': '150'},
'123': {'age': '47', 'id': '123', 'name': 'Robin', 'weight': '320'},
'221': {'age': '47', 'id': '221', 'name': 'Larry', 'weight': '190'},
'331': {'age': '25', 'id': '331', 'name': 'Wilson', 'weight': '255'},
'412': {'age': '21', 'id': '412', 'name': 'Jack', 'weight': '180'}}
跟进
为了进一步讨论,这里是如何在 sqlite 中完成的。我想每个人都需要评估这是否更好。
import csv
from pprint import pprint
import sqlite3
db = sqlite3.connect(":memory:")
db.execute( 'CREATE TABLE lookup (id int, name text);' )
db.execute( 'CREATE TABLE data (id int, age int, weight int);' )
def read_csv(db, table, path):
cur = db.cursor()
header = []
for line in open(path).readlines():
if not header:
header = line.rstrip()
continue
record = line.strip().split(",")
sql = f"INSERT INTO {table} ({header}) VALUES ("
sql = ','.join(['?']*len(record)) ");"
cur.execute(sql, record)
lookup_list = read_csv(db, "lookup", "xxx.csv")
data_list = read_csv(db, "data", "yyy.csv")
cur = db.cursor()
for row in cur.execute(
"SELECT data.id,lookup.name,data.age,data.weight FROM data LEFT JOIN lookup ON lookup.id = data.id;"):
print(row)
输出:
(123, 'Robin', 47, 320)
(221, 'Larry', 47, 190)
(331, 'Wilson', 25, 255)
(412, 'Jack', 21, 180)
(110, None, 14, 150)
uj5u.com热心网友回复:
sqlite3
包含在标准 Python 发行版中。
您可以创建一个存储器数据库,将 csv 内容放入表中,然后进行实际的左连接。
请参阅此答案以从 csv 创建 sqlite 数据库 使用 Python 将 CSV 档案汇入 sqlite3 数据库表
使用该答案中显示的方法创建表格。假设您呼叫了表t_lookup
和t_data
,并且呼叫了数据库连接conn1
。
cursor = conn1.cursor()
cursor.execute('''
SELECT t1.*, t2.name
FROM
t_data t1
LEFT JOIN
t_lookup t2
ON t1.id = t2.id;''')
left_result = cursor.fetchall()
for row in left_result:
print(row)
conn1.close()
0 评论