How to build a JSON file with nested records from a flat data table?(如何使用平面数据表中的嵌套记录构建 JSON 文件?)
问题描述
我正在寻找一种 Python 技术,可以从 Pandas 数据框中的平面表构建嵌套的 JSON 文件.例如,如何使用熊猫数据框表,例如:
teamname member firstname lastname orgname phone mobile0 1 0 约翰·多伊匿名 916-555-12341 1 1 Jane Doe 匿名 916-555-4321 916-555-78902 2 0 米奇驼鹿 916-555-0000 916-555-11113 2 1 Minny Moose Moosers 916-555-2222被提取并导出到一个 JSON 格式,如下所示:
<代码>{团队":[{"团队名称": "1",成员":[{"firstname": "约翰","lastname": "母鹿","orgname": "匿名","电话": "916-555-1234",移动的": "",},{"firstname": "简","lastname": "母鹿","orgname": "匿名","电话": "916-555-4321","手机": "916-555-7890",}]},{"团队名称": "2",成员":[{"firstname": "米奇","lastname": "驼鹿","orgname": "Moosers","电话": "916-555-0000","手机": "916-555-1111",},{"firstname": "Minny","lastname": "驼鹿","orgname": "Moosers","电话": "916-555-2222",移动的": "",}]}]}
我尝试通过创建一个 dict 的 dict 并转储到 JSON 来做到这一点.这是我当前的代码:
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')memberDictTuple = []对于索引,data.iterrows() 中的行:数据行 = 行rowDict = dict(zip(columnList[2:], dataRow[2:]))teamRowDict = {columnList[0]:int(dataRow[0])}成员 ID = 元组(行 [1:2])成员 ID = 成员 ID[0]团队名称 = 元组(行 [0:1])团队名称 = 团队名称[0]memberDict1 = {int(memberId):rowDict}memberDict2 = {int(teamName):memberDict1}memberDictTuple.append(memberDict2)memberDictTuple = 元组(memberDictTuple)formattedJson = json.dumps(memberDictTuple, indent = 4, sort_keys = True)打印格式化Json
这会产生以下输出.每个项目都嵌套在团队名称"1 或 2 下的正确级别,但如果记录具有相同的团队名称,则应嵌套在一起.我该如何解决这个问题,以便团队名称 1 和团队名称 2 各有 2 个嵌套的记录?
<预><代码>[{1":{0":{"email": "john.doe@wildlife.net","firstname": "约翰","lastname": "母鹿","移动": "无","orgname": "匿名",电话":916-555-1234"}}},{1":{1":{"email": "jane.doe@wildlife.net","firstname": "简","lastname": "母鹿","手机": "916-555-7890","orgname": "匿名",电话":916-555-4321"}}},{2":{0":{"email": "mickey.moose@wildlife.net","firstname": "米奇","lastname": "驼鹿","手机": "916-555-1111","orgname": "Moosers",电话":916-555-0000"}}},{2":{1":{"email": "minny.moose@wildlife.net","firstname": "Minny","lastname": "驼鹿","移动": "无","orgname": "Moosers",电话":916-555-2222"}}}]
这是一个有效的解决方案,可以创建所需的 JSON 格式.首先,我按适当的列对数据框进行分组,然后我没有为每个列标题/记录对创建字典(并丢失数据顺序),而是将它们创建为元组列表,然后将列表转换为有序字典.为其他所有内容分组的两列创建了另一个 Ordered Dict.列表和有序字典之间的精确分层对于 JSON 转换以产生正确的格式是必要的.另请注意,在转储为 JSON 时,sort_keys 必须设置为 false,否则您的所有 Ordered Dicts 将按字母顺序重新排列.
导入熊猫导入json从集合导入 OrderedDictinputExcel = 'E:\teams.xlsx'exportJson = 'E:\teams.json'data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')# 这将创建一个列标题元组,供以后使用,将它们与列数据匹配列 = []columnList = 列表(数据[0:])对于 columnList 中的 col:cols.append(str(col))columnList = 元组(列)#这按teamname"和members"列对数据框进行分组grouped = data.groupby(['teamname', 'members']).first()#这将创建对组索引级别的引用groupnames = data.groupby(["teamname", "members"]).grouper.levelstm = (组名[0])#创建一个列表,将团队记录添加到第一个for"循环的末尾团队列表 = []对于 tm 中的 teamN:teamN = int(teamN) #加入这个是为了防止TypeError: 1 is not JSON serializabletempList = [] #创建一个临时列表,将每条记录添加到对于索引,grouped.iterrows() 中的行:数据行 = 行if index[0] == teamN: #如果索引与团队编号匹配,则选择分组数据帧的每一行中的记录#为了让 JSON 记录以相同的顺序出现,我必须首先创建一个元组列表,然后转换为 Ordered DictrowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])])rowDict = OrderedDict(rowDict)tempList.append(rowDict)#创建另一个有序字典以保持团队名称"和临时列表中的成员列表排序t = ([('teamname', str(teamN)), ('members', tempList)])t = OrderedDict(t)#将 Ordered Dict 附加到之前创建的团队的空列表中列表X = tteamList.append(ListX)#创建一个包含单个项目的最终字典:团队列表团队 = {团队":团队列表}#转储为JSON格式formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys 必须设置为 False,否则所有字典都将被字母化formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" 是 Pandas 数据帧中的 NULL 格式 - 必须替换为 "NULL" 才能成为有效的 JSON 文件打印格式化Json#导出到JSON文件解析 = 打开(exportJson,w")parsed.write(formattedJson)打印"
导出到 JSON 完成"
I'm looking for a Python technique to build a nested JSON file from a flat table in a pandas data frame. For example how could a pandas data frame table such as:
teamname member firstname lastname orgname phone mobile
0 1 0 John Doe Anon 916-555-1234
1 1 1 Jane Doe Anon 916-555-4321 916-555-7890
2 2 0 Mickey Moose Moosers 916-555-0000 916-555-1111
3 2 1 Minny Moose Moosers 916-555-2222
be taken and exported to a JSON that looks like:
{
"teams": [
{
"teamname": "1",
"members": [
{
"firstname": "John",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-1234",
"mobile": "",
},
{
"firstname": "Jane",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-4321",
"mobile": "916-555-7890",
}
]
},
{
"teamname": "2",
"members": [
{
"firstname": "Mickey",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-0000",
"mobile": "916-555-1111",
},
{
"firstname": "Minny",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-2222",
"mobile": "",
}
]
}
]
}
I have tried doing this by creating a dict of dicts and dumping to JSON. This is my current code:
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
memberDictTuple = []
for index, row in data.iterrows():
dataRow = row
rowDict = dict(zip(columnList[2:], dataRow[2:]))
teamRowDict = {columnList[0]:int(dataRow[0])}
memberId = tuple(row[1:2])
memberId = memberId[0]
teamName = tuple(row[0:1])
teamName = teamName[0]
memberDict1 = {int(memberId):rowDict}
memberDict2 = {int(teamName):memberDict1}
memberDictTuple.append(memberDict2)
memberDictTuple = tuple(memberDictTuple)
formattedJson = json.dumps(memberDictTuple, indent = 4, sort_keys = True)
print formattedJson
This produces the following output. Each item is nested at the correct level under "teamname" 1 or 2, but records should be nested together if they have the same teamname. How can I fix this so that teamname 1 and teamname 2 each have 2 records nested within?
[
{
"1": {
"0": {
"email": "john.doe@wildlife.net",
"firstname": "John",
"lastname": "Doe",
"mobile": "none",
"orgname": "Anon",
"phone": "916-555-1234"
}
}
},
{
"1": {
"1": {
"email": "jane.doe@wildlife.net",
"firstname": "Jane",
"lastname": "Doe",
"mobile": "916-555-7890",
"orgname": "Anon",
"phone": "916-555-4321"
}
}
},
{
"2": {
"0": {
"email": "mickey.moose@wildlife.net",
"firstname": "Mickey",
"lastname": "Moose",
"mobile": "916-555-1111",
"orgname": "Moosers",
"phone": "916-555-0000"
}
}
},
{
"2": {
"1": {
"email": "minny.moose@wildlife.net",
"firstname": "Minny",
"lastname": "Moose",
"mobile": "none",
"orgname": "Moosers",
"phone": "916-555-2222"
}
}
}
]
This is the a solution that works and creates the desired JSON format. First, I grouped my dataframe by the appropriate columns, then instead of creating a dictionary (and losing data order) for each column heading/record pair, I created them as lists of tuples, then transformed the list into an Ordered Dict. Another Ordered Dict was created for the two columns that everything else was grouped by. Precise layering between lists and ordered dicts was necessary to for the JSON conversion to produce the correct format. Also note that when dumping to JSON, sort_keys must be set to false, or all your Ordered Dicts will be rearranged into alphabetical order.
import pandas
import json
from collections import OrderedDict
inputExcel = 'E:\teams.xlsx'
exportJson = 'E:\teams.json'
data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
# This creates a tuple of column headings for later use matching them with column data
cols = []
columnList = list(data[0:])
for col in columnList:
cols.append(str(col))
columnList = tuple(cols)
#This groups the dataframe by the 'teamname' and 'members' columns
grouped = data.groupby(['teamname', 'members']).first()
#This creates a reference to the index level of the groups
groupnames = data.groupby(["teamname", "members"]).grouper.levels
tm = (groupnames[0])
#Create a list to add team records to at the end of the first 'for' loop
teamsList = []
for teamN in tm:
teamN = int(teamN) #added this in to prevent TypeError: 1 is not JSON serializable
tempList = [] #Create an temporary list to add each record to
for index, row in grouped.iterrows():
dataRow = row
if index[0] == teamN: #Select the record in each row of the grouped dataframe if its index matches the team number
#In order to have the JSON records come out in the same order, I had to first create a list of tuples, then convert to and Ordered Dict
rowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])])
rowDict = OrderedDict(rowDict)
tempList.append(rowDict)
#Create another Ordered Dict to keep 'teamname' and the list of members from the temporary list sorted
t = ([('teamname', str(teamN)), ('members', tempList)])
t= OrderedDict(t)
#Append the Ordered Dict to the emepty list of teams created earlier
ListX = t
teamsList.append(ListX)
#Create a final dictionary with a single item: the list of teams
teams = {"teams":teamsList}
#Dump to JSON format
formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys MUST be set to False, or all dictionaries will be alphebetized
formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" is the NULL format in pandas dataframes - must be replaced with "NULL" to be a valid JSON file
print formattedJson
#Export to JSON file
parsed = open(exportJson, "w")
parsed.write(formattedJson)
print"
Export to JSON Complete"
这篇关于如何使用平面数据表中的嵌套记录构建 JSON 文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何使用平面数据表中的嵌套记录构建 JSON 文件
基础教程推荐
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- 筛选NumPy数组 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01