如何将嵌套的 json 文件加载到 pandas DataFrame

请帮助我似乎无法将 json 数据放入 Dataframe。

加载数据

data =json.load(open(r'path'))#this works fine and displays:

json data

{'type': 'FeatureCollection', 'name': 'Altstadt Nord', 'crs': {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}}, 'features': [{'type': 'Feature', 'properties': {'Name': 'City-Martinsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9595637, 50.9418396], [6.956624, 50.9417382], [6.9543173, 50.941603], [6.9529869, 50.9413664], [6.953062, 50.9408593], [6.9532873, 50.9396289], [6.9533624, 50.9388176], [6.9529333, 50.9378373], [6.9527509, 50.9371815], [6.9528367, 50.9360659], [6.9532122, 50.9352884], [6.9540705, 50.9350653], [6.9553258, 50.9350044], [6.9568815, 50.9351667], [6.9602074, 50.9355047], [6.9608189, 50.9349165], [6.9633939, 50.9348827], [6.9629433, 50.9410622], [6.9616236, 50.9412176], [6.9603898, 50.9414881], [6.9595637, 50.9418396]]]}}, {'type': 'Feature', 'properties': {'Name': 'Gereonsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9629433, 50.9410622], [6.9629433, 50.9431646], [6.9611408, 50.9433539], [6.9601752, 50.9436649], [6.9588234, 50.9443409], [6.9579651, 50.9449763], [6.9573213, 50.945801], [6.9563128, 50.9451926], [6.9551756, 50.9448546], [6.9535663, 50.9446518], [6.9523432, 50.9449763], [6.9494464, 50.9452602], [6.9473435, 50.9454495], [6.9466998, 50.9456928], [6.9458415, 50.946531], [6.9434168, 50.9453954], [6.9424726, 50.9451926], [6.9404342, 50.9429888], [6.9404771, 50.9425156], [6.9403269, 50.9415016], [6.9400479, 50.9405281], [6.9426228, 50.9399872], [6.9439103, 50.9400143], [6.9453051, 50.9404875], [6.9461634, 50.9408931], [6.9467427, 50.941096], [6.9475581, 50.9410013], [6.9504227, 50.9413191], [6.9529869, 50.9413664], [6.9547464, 50.9416368], [6.9595637, 50.9418396], [6.9603898, 50.9414881], [6.9616236, 50.9412176], [6.9629433, 50.9410622]]]}}, {'type': 'Feature', 'properties': {'Name': 'Kunibertsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9629433, 50.9431646], [6.9637129, 50.9454917], [6.9651506, 50.9479252], [6.9666097, 50.9499124], [6.9667599, 50.9500882], [6.9587777, 50.9502504], [6.9573213, 50.945801], [6.9579651, 50.9449763], [6.9588234, 50.9443409], [6.9601752, 50.9436649], [6.9611408, 50.9433539], [6.9629433, 50.9431646]]]}}, {'type': 'Feature', 'properties': {'Name': 'Nördlich Neumarkt', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9390331, 50.9364418], [6.9417153, 50.9358738], [6.9462214, 50.9358062], [6.9490109, 50.9355628], [6.9505129, 50.9353329], [6.9523798, 50.9352924], [6.9532122, 50.9352884], [6.9528367, 50.9360659], [6.9527509, 50.9371815], [6.9529333, 50.9378373], [6.9533624, 50.9388176], [6.9532381, 50.9398222], [6.9529869, 50.9413664], [6.9504227, 50.9413191], [6.9475581, 50.9410013], [6.9467427, 50.941096], [6.9453051, 50.9404875], [6.9439103, 50.9400143], [6.9424663, 50.9399574], [6.9400479, 50.9405281], [6.9390331, 50.9364418]]]}}]}

现在我似乎无法将它放入 Dataframe // pd.DataFrame(data) --> ValueError: Mixing dicts with non-Series may lead to ambiguous ordering. full error

我尝试使用 json_flatten 进行展平,但 ModuleNotFoundError: No module named 'flatten_json' 即使我通过 pip install 安装了 json-flatten

也试过 df =pd.DataFrame.from_dict(data,orient='index')

df
Out[22]: 
                                                          0
type                                      FeatureCollection
name                                          Altstadt Nord
crs       {'type': 'name', 'properties': {'name': 'urn:o...
features  [{'type': 'Feature', 'properties': {'Name': 'C...

df Out[22]

stack overflow How to load a nested json file into a pandas DataFrame
原文答案
author avatar

接受的答案

我认为您可以使用 json_normalize 将它们加载到pandas。

在这种情况下,test.json 是您的完整 json 文件(带双引号)。


import json
from pandas.io.json import json_normalize

with open('path_to_json.json') as f:
  data = json.load(f)
  df = json_normalize(data, record_path=['features'], meta=['name']) 

print(df)

这会产生如下所示的数据框。 Output Dataframe

您可以在 normalize 方法中进一步添加记录字段,以便为多边形坐标创建更多列。

您可以在 https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html 找到更多文档

希望有帮助。


答案:

作者头像

json 数据包含具有不同数据类型的元素,这些元素不能加载到一个数据帧中。

查看 json 中的数据类型:

[type(data[k]) for k in data.keys()]
# Out: [str, str, dict, list]

data.keys()
# Out: dict_keys(['type', 'name', 'crs', 'features'])

您可以将每个数据块加载到单独的数据框中,如下所示:

df_crs = pd.DataFrame(data['crs'])
df_features = pd.DataFrame(data['features'])

data['type']data['name'] 是字符串

data['type']
# Out 'FeatureCollection'

data['name']
# Out 'Altstadt Nord'